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fSJ Abstract. We give a sequential model for noninterference security in- 

I eluding probability (but not demonic choice), thus supporting reasoning 

^ about the likelihood that high-security values might be revealed by obser- 

^~5 vations of low-security activity. Our novel methodological contribution is 

[~^ the definition of a refinement order (C) and its use to compare security 

measures between specifications and (their supposed) implementations. 

This contrasts with the more common practice of evaluating the security 

of individual programs in isolation. 



The appropriateness of our model and order is supported by our show- 
Q_) ing that (C) is the greatest compositional relation -the compositional 

' ' closure- with respect to our semantics and an "elementary" order based 

on Bayes Risk — a security measure already in widespread use. We also 
►^^ relate refinement to other measures such as Shannon Entropy. 

\l By applying the approach to a non-trivial example, the anonymous- 

^O majority Three-Judges protocol, we demonstrate by example that cor- 

^—^ rectness arguments can be simplified by the sort of layered developments 

. -through levels of increasing detail- that are allowed and encouraged by 

t^^ compositional semantics. 

o 
o 
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1 Introduction 

We apply notions of testing equivalence and refinement, based on Bayes Risk, to 
the topic of noninterference security [10] with probability but without demonic 
choice. Previously, we have studied noninterference for demonic systems without 
probabilistic choice [26, 27], and we have studied probability and demonic choice 
without noninterfence [28,21]. Here thus we are completing a programme of 
treating these features "pairwise." 

Our long-term aim -as we explain in the conclusion- is to treat all three 
features together, based on the lessons we have learned by treating strict subsets 
of them. The benefit (should we succeed) would apply not only to security, 
but also to conventional program development where, in the presence of both 
probabilistic and demonic choice, the technique of data-transformation (aka. 
data refinement or data reification) becomes unexpectedly complex: variables 
inside local scopes must be treated analogously to "high security" variables in 
noninterference security. 

We take the view, learned from others, that program/system development 
benefits from a comparison of specification programs with (putative) implemen- 
tations of them, wherever this is possible, via a mathematically defined "refine- 
ment" relation whose formulation depends ultimately on a notion of testing that 
is agreed-to subjectively by all parties concerned [8].'' To explain our position 
unambiguously, we begin by recalling the well known effects of this approach for 
conventional, sequential programming. 

1.1 Elementary testing and refinement for conventional programs 

Consider sequential programs operating over a state-space of named variables 
with fixed types, including a program abort that diverges (such as an infinite 
loop). We allow demonic nondeterminsm, statements such as x:=0 H x:=l, in 
the now-conventional way in which they represent equally abstraction (we do 
not care whether x is assigned or 1, as long as it is one or the other), on the 
one hand, or unpredictable and arbitrary run-time choice on the other. 

Having determined a "specification" program 5, we address the question of 
whether we are prepared to accept some program / that purports to "implement" 
it. Although there is nowadays a widely accepted answer to this, we imagine that 
we are considering the question for the first time and that we are hoping to find 
an answer that everybody will accept. For that we search for a test on programs 
that is "elementary" in the sense that it is conceptually simple and that no 
"reasonable" person could ever argue that S is implemented by / if it is the case 
that S always passes the test but / might fail it. ^ 



'^ We say "wherever this is possible" since there are many aspects of system develop- 
ment that cannot be pinned down mathematically. But -we argue- those that can 
be, should be. 

^ There is a possibly dichotomy here between "may testing" and "must testing," and 
we are taking the latter in this example: if S must pass a certain test, then so must 
/ if it is to be considered an implementation. 
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A common choice for such an elementary test is "can diverge," where diver- 
gence is considered to be a bad thing: using it, our criterion becomes "if / indeed 
implements S and / can diverge, then it must be possible for S to diverge also." 
We note that the elementary test cannot be objectively justified: it is an "ax- 
iom" of the approach that will be built on it; and it is via the subjective axioms 
(in any approach) that we touch reality, where we avoid an infinite definitional 
regress. 

The elementary test provides an "only if" answer to the implementation 
question, but not an "if." That is, we do not say that / implements S if either 
/ never fails the test or S might fail it: this is not practical, because of context. 
For an example, let S be if x^^O then abort fi and let / be simply abort. 
Then indeed S passes the test if / does (because they both fail) ; but we cannot 
accept / generally as a replacement for S because context x:=: 0; S "protects" S, 
and passes the test as a whole; but the same context does not protect /, since 
x:= 0; / (still) fails. This illustrates the inutility of the elementary view taken on 
its own, and it shows that we need a more sophisticated comparison in order to 
have a practical tool that respects contexts. (Thus it is clear above that we must 
add "if executed from the same initial state.") The story leads on from here 
to a definition, ultimately, of sequential-program refinement (C) as the unique 
relation such that ^ 

(i) soundness If S'C/ then for all contexts C we have that C{I) passes the 

elementary test if C(S') does, and 
(ii) completeness If 5*^/ then there is some context C such that C{I) fails the 

elementary test although C{S) passes it. 

That relation turns out to have the direct definition that S'C/ just when, for all 
initial states s, if executing / from s can deliver some final state s' then -from 
s again- either S can deliver s', as well, or S can diverge. Crucially, it is the 
direct definition that allows (C) to be determined without examining all possible 
contexts. 

1.2 Elementary testing and refinement for probabilistic 
noninterference-secure programs 

In attempting to follow the trajectory of §1.1 into the modern context of nonin- 
terference and probability, we immediately run into the problem that there are 
competing notions of elementary test. Here are just four of them: 

Bayes Risk [34,5,1,2] is based on the probability an attacker can reveal a 
high-security, "hidden" variable h using a single query "Is h equal to /i?" 
where h is some value in h's type. Here (and below) the elementary testing 
of S wrt. / requires that the probability of revealing h in / cannot be higher 
than it is in 5*. 



® We say "a" rather thean "the" definition of refinement because this is just an ex- 
ample: other elementary tests, and other possible contexts, lead naturally to other 
definitions. 
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marginal guesswork [30, 15] is measured in terms of how many queries of the 

form "Is h equal to hT' are needed to determine h's value with a given 

probability. 
Shannon Entropy [33] is related to the use of multiple queries of the form "Is 

h in some set H7" where H is a, subset of h's type. 
guessing entropy [19, 15] is the average number of "Is h equal to ft.?" guesses 

necessary to determine h's value. 

Not only do these criteria compete for popularity, it turns out that on their 
own they are not even objectively comparable. For instance, Pliam [30] finds 
that there can be no general ordering between marginal guesswork and Shannon 
Entropy: that is, from a marginal-guesswork judgement of whether S passes all 
tests that / does, there is no way to determine whether the same would hold 
for Shannon-entropy judgements, nor vice versa. Similarly, Smith has compared 
Bayes Risk and Shannon Entropy, and claims that these measures are inconsis- 
tent in the same sense [34]. The general view seems to be that none of these 
(four) methods can be said to be generally more- or less discriminating than any 
of the others. 

In spite of the above, one of our contributions here is to show that Bayes Risk 
is maximally discriminating among those four if context is taken into account. 

1.3 Features of our approach: a summary 

Our most significant deviation from traditional noninterference is that, rather 
than calculating security measures of programs in isolation, instead we focus on 
comparing security measures between programs: typically one is supposed to be 
a specification, and another is supposed to be an implementation of it. What we 
are looking for is an implementation that is at least as secure as its specification. 

Since we never consider the security of programs in isolation, an advantage is 
that it is possible easily to arrange certain kinds of permissible information flow. 
For example whenever s>i holds, a program / that leaks only the i low-order 
bits of a hidden integer h is secure with respect to a specification S that leaks 
the s low-order bits of h — that is, for any implementation of S, the leaking of 
up to s low-order bits of h is allowed but no more. This way we sometimes can 
avoid separate tools for declassification: to allow an implementation to release 
(partial) information, we simply arrange that its specification does so. 

Typically it is both functional- and security properties (however we measure 
them) that are of interest. As such, we would like to define a relation (C) between 
these programs so that SQI just when implementation I has all the functional 
and the security properties that specification S does, where "all" is interpreted 
within our terms of reference. For incremental, compositional reasoning with such 
an order, it has been known from the very beginning [37] that the refinement 
relation (C) must satisfy two key technical properties: 

Transitivity If S^M\ZI then also SQL Because of this a comparison between 
two large programs S, I can be carried out via S C Mi E • • • E M^ E I 
through many small steps over a long time. 
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Monotonicity of contexts If 5* C / then also C{S) C C{I), where C is any 
program context. Because of this, a large comparison can be carried out via 
many small steps independently by a large programming team working in 
parallel. 

As argued above, since our comparisons rest ultimately on subjective criteria 
for failure, we reduce that dependency on what is essentially an arbitrary choice 
by making those criteria as elementary as possible: when can you be absolutely 
sure that Si^I, that refinement should fail? For this purpose we identify an 
elementary testing relation (^) based on Bayes Risk, such that if S^^I then / 
"certainly" (but still subjectively) does not satisfy the specification S in terms 
of "reasonable" functional- and probabilistically secure properties. 

Because our (r^) is not respected by all contexts (there exist programs S,I 
and context C such that S*^/, yet C{S):^C{I) in spite of that) our relation (C) is 
chosen so that it is smaller -i.e. more restrictive- than (^), so that it excludes 
just those "apparent" refinements that can be voided by context. 

Our refinement relation is the compositional closure of (^), the largest rela- 
tion (C) such that S'C/ implies C{S)^C{I) for all possible contexts C. Abusing 
terminology slightly, we will for simplicity say that (C) is compositional just 
when it is respected by all possible contexts C (whereas strictly speaking we 
should say that all such Cs are (!Z)-monotonic). Further, we note that if we de- 
fine equivalence A^B to be "bi-refinement" A\ZB and BQA then monotonicity 
of (C) implies that (~) is is a congruence for all contexts C. 

There are two further, smaller idiosyncracies of our approach. The first is 
that we allow the high-security, "hidden" variables to be assigned-to by the pro- 
gram, so that it is the secrecy of the final value h' of h that is of concern to 
us, not the initial value h. This is because we could not otherwise meaningfully 
compare functional properties, nor would we be able to treat (sequential) com- 
positional contexts. The other difference, more a position we take, is that we 
allow an attacker both perfect recall and an awareness of implicit flow, that the 
intermediate values of low-security "visible" program variables are observable, 
even if subsequently overwritten; and that the control-flow of non-atomic pro- 
gram statements is observable. As shown in our case study (§8.3) it is this which 
allows us to model distributed applications: there, the values of intermediate 
variables can be observed (and recalled) if they are sent on an insecure channel, 
and the control flow of a program may be witnessed (for example) by observing 
which request an agent is instructed to fulfill. 

In summary, our technical contribution is that we (i) give a sequential 
semantics for probabilistic noninterference, (ii) define the above order (^) based 
on Bayes Risk, (iii) show it is not compositional, (iv) identify a compositional 
subset of it, a refinement order (C) such that 5*^/ implies C{S)^C{I) for all 
contexts C and (v) show that (C) is in fact the compositional closure of (^), so 
that in fact we have S%I only when C{S):^C{I) for some C. 

Finally, we note (vi) that (C) is sound for the other three, competing no- 
tions of elementary test and that therefore Bayes- Risk testing, with context, is 
maximally discriminating among them. 
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These technical contributions further our general goal of structuring secure 
protocols hierarchically and then designing/verifying them in separate pieces, 
a claim that we illustrate by showing how our model and our secure-program 
ordering may be used to give an incremental development of The Three Judges, 
an "anonymous majority" protocol we constructed precisely to make this point. 

2 A probabilistic, noninterference sequential semantics 

We identify visible variables (low-security), typically v in some finite type V, and 
hidden variables (high-security), typically h in finite "H. Variables are in sans serif 
to distinguish them from (decorated) values v. V, h: H they might contain. ^ 

As an example, let hidden h:{0,l,2} represent one of three boxes: Box 
has two black balls; Box 1 has one black- and one white ball; and Box 2 has 
two white balls. Then let v: {w, b, _L} represent a ball colour: white, black or 
unknown. Our first experiment in this system is Program S, informally written 
h:=0®l©2; v:e "S^w^s j5®^~2^; v:=_L, that chooses box h uniformly, and then 
draws a ball v from that Box h: from the description above (and the code) we 
can see that with probability h/2 the ball is white, and with probability 1— h/2 
it is black. Then the ball is replaced. A typical security concern is "How much 
information about h is revealed by its assignments to v?" 

We use this program, and that question, to motivate our program syntax and 
semantics, to make Program S the above program precise and to provide the 
framework for asking -and answering- such security questions. 

We begin by introducing distribution notation, generalising the notations for 
naive set theory. 

2.1 Distributions: explicit, implicit and expected values over them 

We write function application as f.x, with "." associating to the left. Operators 
without their operands are written between parentheses, as (^) for example. 
Set comprehensions are written as {s: S' | G • E} meaning the set formed by 
instantiating bound variable s in the expression E over those elements of S 
satisfying formula G. ^ 

By fi>S we mean the set of discrete sub- distributions on set S that sum to no 
more than one, and DS' means the full distributions that sum to one exactly. The 
support \5] of (sub-)distribution S:DS is those elements s in 5 with S.s^O, and 
the weight ^ (5 of a distribution is (^ s: \S~\ • 6.s) , so that full distributions have 



We say hidden and visible, rather than high- and low security, because of the con- 
nection with data refinement where the same technical issues occur but there are no 
security implications. 

This is a different order from the usual notation {E \ s£SAG}, but we have good rea- 
sons for using it: calculations involving both sets and quantifications arc made more 
reliable by a careful treatment of bound variables and by arranging that the order 
S/G/E is the same in both comprehensions and quantifications (as in ( Vs: S \ G • E) 
and {3s:S\ G - E)). 
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weight 1. Distributions can be scaled and summed according to the usual point- 
wise extension of arithmetic to real-valued functions, so that {c*d).s is c*((5.s) 
for example; the normalisation of a (sub-)distribution S is defined [S]'.— S/'^S. 
Here are our notations for explicit distributions (cf. set enumerations): 

multiple We write ^x®^, y®'^, ■ ■ ■ , z®'"^- for the distribution assigning probabil- 
ities p,q,- ■ ■ , r to elements x,y,- ■ ■ ,z respectively, with p-\-q+ ■ ■ ■ -\-r < 1. 

uniform When explicit probabilities are omitted they are uniform: thus \x^ is 

the point distribution ix®^^, and ix,y,zj is lx®Ti ,y®^ ,z®^J. And 81^82 

is (5i 10(52. 
2 

In general, we write {Q d: 6 • E) for the expected value (^ d: \5'\ • 6.d * E) 
of expression E interpreted as a random variable in d over distribution d.^ If 
however E is Boolean, then it is taken to be 1 if _E holds and otherwise: thus 
in that case {Qd:5 • E) is the combined probability in 6 of all elements d that 
satisfy E. 

We write implicit distributions (cf. set comprehensions) as ^d: 5 \ R • E^, 
for distribution S, real expression R and expression E, meaning 

{Qd:S • R*IE}) / {Qd:S • R) (1) 

where, first, an expected value is formed in the numerator by scaling and adding 
point-distribution ^E\ as a real-valued function: this gives another distribu- 
tion. The scalar denominator then normalises to give a distribution yet again. A 
missing E is implicitly d itself. If R is missing, however, then ^d: 6 • E\ is just 
{Q d: S • "S^i^S') — ill that case we do not multiply by R in the numerator, nor 
do we divide (by anything). 

Thus ^d: S • E\ maps expression E in d over distribution 6 to make a new 
distribution on _E's type. When R is present, and Boolean, it is converted to 0,1; 
thus in that case ^d: d \ R\ is d^s conditioning over formula R as predicate on 
d. 

Finally, for Bayesian belief revision we let 6 be an a-priori distribution over 
some D, and we let expression R for each d in Z? be the probability of a certain 
subsequent result if that d is chosen. Then ^d: 6 \ RJ is the a-posteriori distri- 
bution over D when that result actually occurs. Thus in the three-box program 
S let the value first assigned to v be v. The a-priori distribution over h is uni- 
form, and the probability that the chosen ball is white, that v=w, is therefore 
1/3 * (0/2 + 1/2 -f- 2/2) — 1/2. But the a-posteriori distribution of h given that 
v=w is ^h: d \ h/2\, which from (1) we can evaluate 

= {Q h: {{0,1,2} ■!l*ih})/{Qh: {{0,1,2}-^) = {{1*^5 , 2*»i}} / ^ , 

that is -§^1^3 , 2®3^j to calculate our way to the conclusion that if a white ball 
is drawn (v=w) then the chance it came from Box 2 is 2/3, the probability of 
h=2 in the a-posteriori distribution. 



It is a dot-product between the distribution and the random variable as state- vectors. 
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2.2 Program denotations over a visible/hidden "split" state-space 

We account for the visible and hidden partitioning of the finite state space VxH 
in our new model by building split- states of type Vx DH, whose typical element 
{v,S) indicates that we know v=i; exactly, but that all we know about h -which 
is not directly observable- is that it takes value h with probability S.h. 

Programs become functions (VxDT-L) -^ D(VxD'H) from split-states to dis- 
tributions over them, called hyper- distributions since they are distributions with 
other distributions inside them: the outer distribution is directly visible but the 
inner distribution(s) over "H are not. Thus for a program P with semantics |P], 
the application |P].(u, (5) is the distribution of final split-states produced from 
initial {v,S). Each {v',S') in the support of that outcome, with probability p 
say in the outer- (left-hand) D in D(VxDH), means that with probability p an 
attacker will observe that v is v' and simultaneously will be able to deduce (via 
the explicit observation of v and w' and other implicit observations) that h has 
distribution S' . 

When applied to hyper-distributions, addition, scaling and probabilistic choice 
(p®) are to be interpreted as operations on the outer distributions (as explained 
in §2.1). 



2.3 Program syntax and semantics 

The programming language semantics is given in Fig. 1. In this presentation we 
do not treat loops and, therefore, all our programs are terminating. 

When we refer to classical semantics, we mean the interpretation of a pro- 
gram without distinguishing its visible and hidden variables, thus as a "relation" 
of type (Vx-H)^D(Vx-H).i° 

Atomic commands Syntactically atomic program (fragments), noted * in 
Fig. 1, are first interpreted with respect to their classical probabilistic semantics, 
and are then embedded into the split-state model. To emphasise that they are 
syntaxtically atomic, we call them "A" (rather than "P") in this section. 

Thus the first step is to interpret an atomic program A as a function from 
VxH -pairs to distributions D(Vx'H) of them [16,21] — call that classical in- 
terpretation lAJc so that for an initial (v, h) program A produces a final distri- 
bution lA]c-iv,h), that is some distribution (5'gD(Vx'H). 

Given such a distribution 5', define its v-projection vProj.S' to be given by 
^{v, h):5' • v^, that is the distribution over V, alone, that S' defines if we ignore 
(and aggregate) the /i-components for each distinct v. 

Then define for 5' its u'-conditioning vCond.6' .v' , that is the distribution 
^{v, h):5' I v—v' • ft, J- over % that we get by concentrating on a particular value 
v'. 

^'^ Classical relational and non-probabilistic semantics over a state-space VxH is strictly 
speaking {Vx'H)<r^{Vx'H) or equivalently P((Vx?^)^). Further formulations include 
however both {V x'H)^¥{V xH) and V-i>H-i>P(VxH). Because all these are essen- 
tially the same, we call (Vx'H)— >-D(Vx'H) a "relational" semantics. 
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Program type Program text P Semantics [P].(«, 5) 

Identity skip § («, 5) J • 

Assign to visible v:=£.v.h ^ h:S ' {E.v.h,lh':5 \ E.v.h'=E.v.h}) } • 

Assign to hidden h— S.v.h {{ («, gfe:<5 • E.v.h}) }} • 

Choose prob. visible v:G _D.v.h {{ «': {Qh:5 ' D.v.h) • {v' , ih' : 5 j D.v.h'.v'^) J • 

Choose prob. hidden h:G D.v.h {{ («, (0/i: 5 • D.w./i)) | • 

Composition Pi; Pa (0 («', 5'): IPi].(«, 5) • [PaJ.K, 5')) 

General prob. choice Pi ^.v.h© P2 p* \P\\.{v, §/i: 5 | q.v.K\) p i, {qh-.s * q.v.h) 

+ {l-p)*lP2Uv,lh-5\l-q.v.h}) 

Probabilistic choice Pi pS P2 p * lPij.{v,S) + (1-p) * lP2J.(v,S) p 1= oo„.ta„t 

Conditional choice if G. v. h then Pt p * lPtl.{v,^h: 5 \ G.v.h^) p ie leh-.s • g.v.h) 

else P/ fl + (1-p) * IPfliv, ih: S \ -^G.v.h^) 

For simplicity let V and T-i have the same type X. Expression E.v.h is then of type X, 
distribution D.v.h is of type DX and expression G.v.h is Boolean. Expressions p and 
g.v.h are of type [0, 1]. 

The syntactically atomic commands marked • have semantics calculated by taking the 
classical meaning and then applying Def. 1. The third column for •'d commands is the 
result of doing that. 

Further, the Assign-to semantics are special cases of the Choose-prob. semantics, ob- 
tained by making the distribution D equal to the point distribution f-Bj. And the 
(simple) probabilistic choice is a special case of the general prob. choice, taking q.v.h 
to be the constant function always returning p. Finally, conditional choice is the special 
case of general prob. choice obtained by taking q.v.h to be 1 when G.v.h holds and 
otherwise. 

For distributions in program texts we allow the more familiar infix notation p©, so that 
we can write h:=Oi©l for h:G gO®s, l®s}} and h— 0©1 for the uniform h:G gO, IJ. 
The degenerate cases h:=0 and /t:G ^OJ are then equivalent, as they should be. 

Fig. 1. Split-state semantics of commands 



With these two preliminaries, the distribution over VxD'H we get by inter- 
preting (5' atomically is defined 

embed. (5' := Iv'-.vProj.S' • {v' ,\/Cond.S' .v')} , 

which is in essence just the "grouping together" of all elements («', h') in S' that 
have the same v' . 

There are two routine steps left to finish off the embedding of whole programs; 
and they are given here in Def. 1: 

Definition 1. Induced secure semantics for atomic programs Grven a syntac- 
tically atomic program A we define its induced secure semantics lAj via 

lAl{v,6) := embed.((Dh:S ■ lAlc.{v,h)) . (2) 
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Thus A is applied to the incoming distribution (f , 6) by applying its classical 
meaning {AJc to each (i;,/i)-pair separately, noting that pair's implied weight, 
and then using those weights to combine the resulting (u', /i')-distributions into 
a single distribution 6' of type D(Vx'H). That distribution 6' is then embedded 
into the split-state model as above. 

The effect overall is that an embedding imposes the largest possible ignorance 
of h' that is consistent with seeing v' and knowing the classical semantics |^]c- 
D 

We illustrate the definitions in Fig. 1 by looking at some simple examples. 

Program skip modifies neither v nor h, nor does it change an attacker's 
knowledge of h. Assignments to v or h can use an expression E.v.h or a distribu- 
tion D.v.h; and assignments to v might reveal information about h. For example, 
from Fig. 1 we can explore various assignments to v: 

(i) A direct assignment of h to v reveals everything about h: 

l^,■.= hUv,S) = ih■.^•{h,ihm 
(ii) Choosing v from a distribution independent of h reveals nothing about h; 

(iii) Partially h-dependent assignments to v might reveal something about h: 
|v:=hmod2].(«,S0,l,2:&) = S(0,S0,2:&)®i,(l,Sa)®i:^ 

As a further illustration, we calculate the effect of the first assignment to v in 
Program S as follows: 

= I v': l/3*(-8;5S- + Iw, h} + Iw^) • "Choose prob. visible" 

= "simplify the summation" 

I v':lwM ■ {v',ih' -.10,1,2} I iw®^,b®'-'^l.v'l) J 

= "evaluate outer comprehension" 

i {w,ih': 10,1,2} I ^l), {b,ih': 10,1,2} \ 1^1^}) } 

= i{w,il®i,2®l}), {b,iO®i,l®^l)} . "evaluate conditional distributions" 

As for assignments to h, we see that they affect 6 directly; thus Choosing 
hidden h might 

(iv) increase our uncertainty of h: |h:= 0®l®2].(u, ^O, l^-) = Uv, fO, 1, 2})} 
(v) or reduce it: Ih:= 0®ll.(«, ^0, 1, 2;^) = i{v, fO, 1})} 

(vi) or leave it unchanged: |h:= 2-hl.(i;, ^O, 1, 2S-) = ^i;, 12, 1, OS-)^- 

In all of the above, we saw that the assignment statements were atomic — an 
attacker may not directly witness the evaluation of their right-hand sides. For 
instance, the atomic probabilistic choice v:= h©^h does not reveal which of the 
equally likely operands of (®) was used. 
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Non-atomic commands The first, Composition Pi; P2, gives an attacker per- 
fect recall after P2 of tire visible variable v as it was after Pi , even if P2 over- 
writes v.^^ To see the effects of this, we compare the three-box Program S from 
the start of §2, that is 

h:=0®l®2;v:GSii;®5,&®i-5:^;v:=± , 

with the simpler Program /i defined h:= O®102; v:= _L in which no ball is drawn: 
the final hyper-distributions are respectively 

s(±,si®i,2®t}), (±,|o®t,i®i:&):& (A's) 

and S(±,S0,1,2:&):&. (Z\^J 

We calculated A'g as follows: 

= {v.e fw®^ ,b®^-^l;v~ l.l.{v, 10,1,21) "Choose hidden; Composition" 

= {(D{v,5):lv:eiw®'2 ,b®^~^l{v,i0,l,2}) ' |v:=±].({;,5)) "Composition" 

= "assignment v:= ± independent of h" 

iQ{v,dy.ly:eiw®Kb®'-HUv,iO,l,2l) • {{±,6)}) 

= "Choose prob. visible (see earlier calculation)" 

{Q{vj):Uw,ii®-s,2®H),ib,io®i,i®Hn • u±,m 

= l{±,ll®^^,2®i}),{±,l0®l,l®^)l . "evaluate expected value" 

In neither case A'g nor A'j does the final value _L of v reveal anything about 
h. But Z\j^ is a point (outer) distribution (thus concentrated on a single split- 
state) , whereas A'g is a uniform distribution over two split-states each of which 
recalls implicitly the observation of an intermediate value i) of v that was made 
during the execution leading to that state. Generally, if two split-states {v', S[) 
and {v',52) occur with S[y^6'2 then it means an attacker can deduce whether h's 
distribution is 6{ or 62 even though v has the same final value w' in both cases. 
Although the direct evidence i has been overwritten, the distinct split-states 
preserve the attacker's deductions from it. 

The meaning of General prob. choice Pip.v.h(BPi -of which both Probabilistic 
choice and Conditional choice are specific instances- makes it behave like |Pi] 
with probability p.v.h and IP2I with the remaining probability. The definition 
allows an attacker to observe which branch was taken and, knowing that, she 
might be able to deduce new facts about h. Thus unlike for (v) above we have 
|h:=0 ® h:=lj.{v,S) = i{v , fOj) , {v , flJ)J , which is an example of implicit 
flow. 



It is effectively the Kleisli composition over the outer distribution. 
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A similar implicit information flow in any Conditional choice with guard 
G.v.h makes it possible for an attacker to deduce the value of the guard exactly. 

For General prob. choice Pi p.v.hffi P2 however, the implicit flow might only 
partially reveal the value of the expression p. v. h. For example, suppose we execute 
the probabilistic assignment h:= j ffi |, which establishes that h is either | or 
i with equal probability of each: its output is ^{v, f j, IS')!- Then we execute 
program skip ^0 skip from there, and we find that we do not entirely discover 
the value of h. But still we do discover something: we find that 

Iskip,eskip].KSi,iS) = UvAfKfhfK{v,if',fhY^H, 

and see that indeed the chance of guessing h's value has increased, though we 
still do not know it for certain. Our probability initially of guessing h is 1/2. 
But after the choice we will guess h=i when we see the choice went left, which 
happens with probability 3/8; but if we saw the choice going right we will guess 
h—j, which happens with probability 5/8. Our average chance of guessing h is 
thus (2/3)*(3/8) + (3/5)*(5/8) = 5/8, which is more than the 1/2 it was initially: 
that increased knowledge is what was revealed by the (h®). 



3 The Bayes-Risk based elementary testing order 

The elementary testing order comprises functional- and security characteristics. 

Say that two programs are functionally equivalent iff from the same input 
they produce the same overall output distribution [16,21], defined for hyper- 
distribution A' to be h.A':={{v',S'):A';h':S' • {v',h')J^^ We consider state- 
space VxH jointly, i.e. not V alone, because differing distributions over h alone 
can be revealed by the context (— ; v:= h) that appends an assignment v:= h. 

We measure the security of a program with "Bayes Risk" [34, 5, 1, 2], which 
determines an attacker's chance of guessing the final value of h in one try. The 
most effective such attack is to determine which split-state («', S') in a final 
hyper-distribution actually occurred, and then to guess that h has some value 
h' that maximises d', i.e. so that 6'.h' = US' . For a whole hyper-distribution we 
average the attacks over its elements, weighted by the probability it gives to each, 
and so we we define the Bayes Vulnerability of A' to be bv.Zi':= (0 («', S'): A' • 

For Program S the vulnerability is the chance of guessing h by remembering 
v's intermediate value, say w, and then guessing that h at that point had the value 
most likely to have produced that v: when v=w (probability 1/2), guess h=2; 



Two program texts P{i,2} denote functionally equivalent secure programs just when 
their classical denotations agree, that is when |-Pi]c = [^2]c. The function ft ex- 
presses that semantically, and the connection is thus that |Pi]c = |^2lc just when 
ft.(IPi].(i;,<5))=ft.([P2].(«,<5)) for all {v,S). 

We use vulnerability rather than risk because "greatest chance of leak" is more 
convenient than the dual "least chance of no leak." Our definition corresponds to 
Smith's vulnerability [34]. 
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when v=b, guess h=0. Via bv.A'g that vulnerability is 1/2*2/3 + 1/2*2/3 — 2/3. 
For Ii, however, there is no "leaking" v, and so it is less vulnerable, having 
bs/.A'j^ = 1/3. 

The elementary testing order on hyper-distributions is then defined As^Aj 
iS ft.Z\5=ft.Z\/ and bv.Z\5>bv.Z\/, and it extends pointwise to the elementary 
testing order on whole programs. That is, we say that 5*^/ just when for corre- 
sponding inputs (i) S, I are functionally equivalent and (ii) the vulnerability of / 
is no more than the vulnerability of S. Thus S^Ii because they are functionally 
equivalent and the vulnerabilities of S,Ii are 2/3, 1/3 resp. 

The direction of the inequality {-<) corresponds to increasing security (and 
thus decreasing vulnerability). This agrees with other notions of security that 
increase with increasing entropy of the hidden distribution. 



4 Non-compositionality of the elementary testing order 

Although 5*;^/ is an (elementary) failure of implementation, the complementary 
S^I is not necessarily a success: it is quite possible, in spite of that, that there 
is a context C with C{S):^C{I). That is, simply having S^I does not mean that 
/ is safe to use in place of S in general. 

Thus for stepwise development we require more than just S^I: we must 
ensure that C{S)^C{I) holds for all contexts C(-) in which S, I might be placed 
— and we do not know in advance what those contexts might be. 

Returning to the boxes, we consider now another variation Program I2 in 
which both Boxes 0,1 have two black balls: thus the program code becomes 
h:= 00102; v:€-8;w®(''"^),&®^"(^"2)S-;v:==-L with final hyper-distribution 

The vulnerability of /2 is 1/3*1+2/3*1/2, again 2/3 so that 5^/2. Now if context 
C is defined (- ; h:=h+2), the vulnerability of C(S') is 1/2*2/3 + 1/2*1 := 5/6: 
it is more than for S alone because there are fewer final h-values to choose from. 
But for C(/2) it is greater stih, at 1/3*1 + 2/3*1 = 1. 

Thus 5^/2 but C{S):^C{l2), and so (^) is not compositional. This makes 
{-<) unsuitable, on its own, for secure-program development of any size; and its 
failure of compositionality is the principal problem we solve. 



5 The refinement order, and compositional closure 

The compositional closure of an "elementary" partial order over programs, call 
it (<£;), is the largest subset of that order that is preserved by composition with 
other programs, that is with being placed in a program context. Call that closure 

(<c). 

The utility of (<c) is first that A<cB implies A<eB, so that A<cB suffices 
if A<eB is all that we want: but it implies further that C{A)<eC{B) for all 
contexts C, as well. Its being the greatest such subset of (<_e) means that it 
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relates as many programs as possible, never claiming that A-^qB unless there 
is some context C that forces it to do so because in fact C{A)^eC{B). 

Thus to address the non-compositionality exposed in §4, we seek the compo- 
sitional closure of (^), the unique refinement relation (C) such that {soundness) 
if 5C/ then for all C wc have C{S)<C{I); and [completeness) if 5*^/ then for 
some C we have C{S)^C{I). Soundness gives refinement the property (§4) we 
need for stepwise development; and completeness makes refinement as liberal as 
possible consistent with that. 

We found above that S%l2\ we show later (§6.4) that we do have S'QIi. 

6 Constructive definition of the refinement order 

Although saying thet (C) is the compositional closure of {<) does define it com- 
pletely, it is of little use if to establish S'OI in practice we have to evaluate 
and compare C{S)<C{I) for all contexts C. Instead we seek an explicit construc- 
tion that is easily verified for specific cases. We give a detailed example to help 
introduce our definition. 

For integers x, n, let xrndn be a distribution over the multiplc(s) of n closest 
to x: usually there will be exactly two such multiples, one on either side of x and, 
in that case, the probabilities of each arc inversely proportional to their distance 
from X. Thus 1 rnd 4 is iO®i,4®H and 2 rnd 4 is i0®^,4®^ and 3 rnd 4 
is §0®J,4®4^. If however x happens to be an integer multiple of n then the 
outcome is definite, a point distribution: thus 0rnd4 = ^OJ and 4 rnd 4 = ■8^45-. 

Now consider the two programs 

P2:= h:=l®2®3; v:e hrnd2; v:=hmod2 ,. 

and P^:= h:= lffi2©3; v:e h rnd 4; v:= h mod 2 . ^' 

Both reveal hmod2 in v's final value w', but each P„ also reveals in the overwrit- 
ten visible v, say, something about h rndn; and intuition suggests that Pn^Pm 
for n<m only. Yet in fact the vulnerability is 5/6 for both P2,4j which we can 
see from their final hyper-distributions; they are A'p and A'p given by 

i io,mrK (i,wfs,(i,si,3:&)®i,(i,ra)®i j (a'^j 
i (o,ra)^^(i,si®t,3®i:&f^(i,si^i,3®i:&)®i } {A'pj 

' ^ • 

With overall probability 1/3*3/4 + 1/3*1/4 — 1/3 the final v' will be 1 and v will be 0; since 
v' is 1 then h must be 1 or 3; but if v was that h is three times as likely to have been 1. 

so that e.g. 1/3*1 + 1/3*3/4 + 1/3*3/4 = 5/6 for P4. The overall distribution 
of {v',h') is -§^(0, 2), (1, 1), (1, 3)5- in both cases, so that P2.4 are functionally 
equivalent; but they have different residual uncertainties of h. 

6.1 Hyper-distributions as partitions of fractions 

In our definition of refinement we will consider the hyper-distributions corre- 
sponding to each value of v separately. 
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In the example above, if we consider just the h-distributions associated with 
v'=l then we can, by multiplying through their associated probabilities from 
the hyper-distributions, present them as a collection of fractions, that is sub- 
distributions over "H. We call such collections partitions and here they are given 
for P2 and P4 respectively by ^* 

when.'-l I "'^^-- ai^'iSl^^3^^iP^^S) 

wnen u — i < /friOi oQAti friOA oOinx y^l 



In general, let the function fracs.Zi.f for hyper-distribution A and value v 
give the partition of fractions extracted from A for v=w, as we extracted ilL 41 
from A'f^ ^1 and v'=l at (4) above. 

6.2 Operations on fractions and partitions 

Distribution operations such as support ([•]) and weight (^) and normalise 
[•] apply to fractions, and for example we have that ■S^l®^^' + -8^1^5,3^6^ is 
|1®3,3®3;& and ES1®*>3®5;& is 1/2 and [Sl®3,3®5;^] is |l®i,3®s;^. For 
partitions 77 we write ^ 77 as shorthand for ( (^ tt: 77 • tt) ), so that 



E( 



[i^^:&,si^^3^i:&) is ai®^3^^:&) 



Note that the sum of a partition is still a partition, albeit always with only a 
single fraction in it. Scaling, when applied partition is applied pointwise to each 
of its fractions. An empty partition is written (), and a zero(-weight) fraction is 
written ^^; thus (fj') is a zero-weight partition containing exactly one fraction. 
Finally, the Bayes Vulnerability of a partition bv.77 is (Y^ w: U • Utt) , and 
the Bayes Vulnerability of a hyper-distribution may be equivalently expressed 
using partitions as (^ u: V • bv.(fracs.Z\.w)) . 

6.3 Relationships between fractions and partitions 

Say that two non-zero fractions tt^i 2} are similar, written 7ri?»7r2, just when their 
normalisations are equal, that is when [7ri] = [7r2] so that they are multiples of 
each other: this is an equivalence relation. For example we have -|1®3,2®3 3" ~ 
|^1®4^2®5^ because both normalise to the former. 

Say that a partition is reduced just when it contains no two similar fractions, 
and no zero fractions at all. ^^ For any hyper-distribution A and value v, we 
have that fracs.Z\.f is in reduced form by construction. Thus partitions are more 
expressive than hyper-distributions. 

The reduction of a partition is obtained by by adding-up all its similar frac- 
tions and removing its all-zero fractions, that is by reducing it, and we say that 



^^ Strictly speaking, partitions are multisets of fractions, i.e. without order but possibly 
having repeated elements. 
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Allowing zero fractions, in the unreduced case, simplifies some proofs. 
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two partitions ^{1,2} ^re similar, written i7i!=s7T2, just when they have the same 
reduction. Thus for example we have 

because both reduce to (■§^0®3 5',§1®3,2®3^). If two partitions are similar then, 
for any distribution S over H, the probability that an attacker may deduce that 
h is distributed according to S is the same in either partition. 

Say that one partition iTi is as fine as another 7T2, written 77ilZ7T2, just 
when A2 can be obtained by adding- up one or more groups of fractions in Ai. 
Thus for example we have 

by adding- up the second and third fractions on the left. For as-fine-as the added- 
up fractions do not have to be similar: if however they are similar, then we have 
niK,n2 as well as 7T1C17T2; if they are not similar, we can write ^iC -^2- 

Combining two dissimilar fractions in a partition represents removal of the 
implicit observations that distinguished them. Hence if HiC. 772 then partition 
7T2 conceals h strictly better than 77i does. 

Note that in both cases 111^112 and 7TilZ772 we have J2^i — ^^2, i.e. 
that neither relation allows a change in the overall probability assigned to each 
of the elements. 



6.4 Constructive definition of refinement 

We use the relations («) and (C) between partitions to define refinement. 

Definition 2. Secure refinement We say that hypcr-distribution Zig is securely- 
refined by Z\/, written As C Ai, just when for every v there is some intermediate 
partition 77 of fractions so that first (i) fracs. As-v is similar to 77 and then (ii) 
77 is as fine as fracs. Z\/.u.^^ That is, we have 

As C Aj iff hacs.As-v « 77 IZ fracs. Z\/.w for some partition 77. 

The fractions of As are first split-up into similar sub- fractions; and then some 
of those sub- fractions are rejoined to create the fractions of Aj. 

Refinement of hyper-distributions extends pointwise to the programs that 
produce them. □ 

Note that since both («) and (C) preserve partition-sum, we have that (CI) 
from Def. 2 implies functional equality. Informally speaking, refinement may 
not change the functional behaviour of a secure program, but it may reduce 
the implicit observations available to an attacker, and hence the deductions an 
attacker can make about h. 



^^ In our earlier qualitative work [27] refinement reduces to taking unions of equivalence 
classes of hidden values, so-called "Shadows." Kopf et al. observe similar effects [15]. 
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We return to A'g for an example, getting (^l^s, 2^35., |0®3, l^sj.) for 
fracs.Z\5._L by multiplying through. For A'j we get (-§^0®3 , 1®3 ,2®3^) simi- 
larly for fracs.Zij ._L. The two fractions of the former sum to the single fraction 
of the latter, and so S'C/i according to our definition Def. 2 of secure refinement. 

For the more detailed A'p ^A'p and v'=l, we need the intermediate partition 

n- (|;i®5;^,|l®A,3®ra;&,'si®ra,3®ra:&,|3®5:&), whose middle two fractions 
turn out to be equal, thus certainly similar: summing them gives the middle 
-§^1®5, 3®B^ of Up , so that Up fall. On the other hand, summing the first two 
fractions of U gives fl®J,3®T2^^ the first fraction of Up^, and summing the 
last two give the second fraction of Up ; thus nc.n'p . Partition (-S2®3 J) deals 
trivially with v'=0, and so indeed we have P2QP4 altogether. In §D we show 
however that P4%P2- 

6.5 Properties of refinement 

The refinement relation (C) is a partial order (hence it is transitive), and pro- 
gram contexts preserve it (thus it is monotonic). Consequently, we can reason 
incrementally and compositionally about refinement relation between large pro- 
grams. 

Theorem 1. Partial order The refinement relation (C) is a partial order over 
the set of hyper-distributions; and so, by extension, it is a partial order over 
programs. 

Proof: See §C.l. D 

Theorem 2. Monotonicity of refinement If 5C/ then C(5)CC(/) for all con- 
texts C built from programs as defined in Fig. 1. 

Proof: See §C.2. D 

Furthermore, we define strict refinement such that S \Z I when 5* C / but 
IgS. 

7 Refinement (C) is the compositional closure of (^) 

In this proof we will manipulate partitions, sequential composition, refinement 
and Bayes Vulnerability in terms of matrices, as follows. 

7.1 Matrix representation and manipulation of partitions 

Partitions as matrices Assume wlog that H is the integers 1..H. For a par- 
ticular input (v, 6) and a chosen visible output v' , a program P will produce as 
output a partition 77 ~ fracs.(|P|.(w, (5)).w' over hidden values containing some 
number 7^ of fractions that we index 1..7^. Each fraction on its own is a vector 
of length 77 of probabilities; if we put them together as rows, we get an FxH- 
matrix that represents the partition as a whole. For example, we have from (4) 
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the following matrix representations of partitions output from Programs P{2,4} 
for v'=l: 



1/6^ 



1/12 1/4 



There are three possible values of h in each case, so that H—3; and P2's partition 
has 3 fractions, so that ^2=3 and thus it generates a 3x3 matrix. Program P4's 
partition has only 2 fractions, so that _F4=2 and it generates a 2x3 matrix. 

For simplicity in the proof, we will arrange that H—F so that all matrices are 
of the same (square) dimension NxN. This is without loss of generality, since 
we can extend "H with extra, unused values; and we can extend our partitions 
with extra, zero fractions. For instance Up becomes a 3x3 matrix, as Up is 
already, if we add an extra row underneath (representing an all-zero fraction) : 

1/4 1/12' 
71^: I 1/12 1/4 I . (6) 





(A) Sequential composition as matrix multiplication In our completeness 
proof, our program-differentiating context C will post-compose a probabilistic 
assignment hiGD.h so that, for each of its incoming values h, the output value 
h' will be chosen from the distribution D.h, thus with probability D.h.h'. In 
effect the context redistributes variable h in a way that depends on its current 
value. 

We can consider D itself to be an NxN matrix whose value in row h and 
column h' is just D.h.h' . If we do that, then the output partition 77' that results 
from executing h:e 7?.h on input partition 77 is just 77x7), where (x) is matrix 
multiplication. For example, suppose our post-composed context were 

h:e (Sl®5,2®i,3®i:& ifh=lelse 12^5,3®^:^), (7) 

so that matrix D would be 

'l/2 1/4 1/4^ 
1/2 1/2 
1/2 1/2^ 

From (5) we take the incoming partition 77 to the post-composed context (7) to 
be the outgoing partition 77p from Program P2, and so determine the outgoing 
partition 77' from (PjJ h:e 7).h) overall to be 

1/6 \ /l/2 1/4 1/4\ /I/I2 1/24 1/24 \ 

1/6 1/6 X 1/2 1/2 = 1/12 1/8 1/8 

1/6/ \ 1/2 1/2/ \ 1/12 1/12/ 



1 1/2 0\ 


/1/6 


1/2 1 X 1/6 1/6 


0/ 


\ 1/6 
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(B) Refinement as matrix multiplication Also refinement can be formu- 
lated as matrix multiplication, since it is essentially a rearranging of fractions 
within a partition that, therefore, boils down to rearrangement of rows within 
a matrix. For example, from §6.4 we recall that to refine Up into Up we split 
the middle fraction of the former into two equal pieces, and add them to the 
other two, and that is achieved by the left-hand matrix in the prc-multiplication 
shown here: 

1/4 1/12' 
1/12 1/4 


In general a partition 7T5 is refined by 77/ iff there exists a refinement matrix 
R, a matrix whose columns are non- negative and one-summing, such that RxUg 
equals 77/. Entry (r, c) of such a refinement matrix describes what proportion of 
the c*'' fraction (row) of 7^5 is to contribute by addition to the r*'' fraction of 
Hi. 

(C) Bayes Vulnerability as matrix multiplication Finally we bring Bayes 
Vulnerability into the matrix algebra as well. For a partition 77 as a matrix, the 
vulnerability is found by taking the individual row maxima and adding them 
together: the result is a scalar. Thus for 77p , for example, we have the matrix 

1/4 1/12' 

1/12 1/4 I with maxima selected by the strategy matrix G: 


whose maxima have been set in bold and are selected by the 1 entries in the 
matrix G at right. Note that strategy matrices have the same shape as the 
matrix from which they select, and that they are 0/1 matrices with exactly one 
1 per row. ^^ 

To determine the vulnerability associated with the 77, we calculate 

(U strategy matrices G • tr.(G''^x77)) (8) 

in general, where (•) is matrix transpose and tr takes the trace of a square 
matrix, i.e. the sum of its diagonal. Note that the maximum is actually attained, 
for some G, since there are only finitely many of them. In this particular case 
we use the G above to calculate tr.(G'^x77p ), and have therefore 

1/4 1/12\ /1/4 1/12' 

1/12 1/4 = 000 

0/ \1/12 1/4 

whose trace is 1/4 + + 1/4 = 1/2 to give the Bayes Vulnerability of 77p 





Of course in an all-zero row it makes no difference which entry is selected. 
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(D) The connection between strategy matrices and refinement For any 

strategy matrix G, the transpose G^ has exactly one 1 in each column, and thus 
can be regarded as a simple refinement matrix, one of those which (when pre- 
multiplied with a partition) merges only whole fractions. If we denote the set of 
NxN stratetjy matrices by Qn, the set oi NxN T^efinement matrices by TZn, 
and the siA^ple subset of these (having only one non-zero entry per row) by 
A^TV, we thus have that 

{G-.Gn-G'^} = Mn C TZn ■ 

Furthermore, it can be shown that the complete set of refinement matrices TZn 
is in fact the convex closure of its simple subset: 

TZn == cc\.{Mn) ■ (9) 

From (8) and by linearity of matrix operations multiplication and trace, we thus 
have for any A^ x A^-dimensional U that the Bayes Vulnerability is given by 

bv.il = (UGi^AT • tr.(G'^x7T)) = (Ui?:7^^r • tr.(i?x7T)) , (10) 

because the extra elements in TZjy but not Q^ are only interpolations, and so 
cannot increase the maximum of a linear expression. (Recall from above that 
this maximum is attained for some R.) 

Additionally, TZn forms a monoid under matrix multiplication, that is 

(7?.Ar, X, Iat) is a monoid, (11) 

where In is the NxN unit of matrix multiplication.^^ We refer to §A for a 
proof of Properties (9) and (11). 



7.2 Soundness 

Here from 5C/ we must show that C{S)^C(I) for all contexts C. From mono- 
tonicity (Thm. 2) it suffices to show that 5C/ implies S^I. 

Fix an initial split-state and construct the output hyper-distributions A'rg j-, 
that result from S, I respectively. Then since we assume S'C/ we must have 
A'g\ZA'j. We now show that this implies A'g^A'j. 

Since S'C/ trivially guarantees that ft.A'g = h.A'j -recall Def. 2- we need 
to show that the Bayes- Vulnerability condition in the elementary testing or- 
der is satisfied. Since bv.A = {^v:V • bv.(fracs.Zi.w)) , it is enough to show 
that for each v.V the vulnerability of IIg:=fracs.Ag.v is no less than that of 
n'j-.^fracs.A'j.v. 

For any such 11', g j-, assume wlog that they are represented as NxN matrices. 
We then have that 



^® Note that it is not a group because only the matrices in TZn that permute -but do 
not combine- fractions have inverses. 
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bv.n's 

= {UR-.Un • tr.(i?x7T^)) . "from (D), Property (10)" 

= IuRi,R2:TZn ' tr.(i?iXi?2Xi7^)) "from (D), Property (11)" 

> {URi-.TIn ' tr.(i?iX^x7T^)) ^"for any W' 

= [URi-.TZn • tr.{Rixn'j)) . "bom (B), choose Ih so that R2xn's = n'l" 

= bv.n'j . "from (D), Property (10)" 

That gives us 

Theorem 3. Refinement is sound for Hayes Risk If 5'C/ then C{S)^C(I) 
for all contexts C. □ 



7.3 Completeness 

Here from S%I we must discover a context C such that C{S):^C{I). (The proof 
here is self-contained; but as background we give a fully worked example in §D). 

Since S%I, there must be an initial split-state (w, 5) from which S, I yield 
hyper-distributions ^'r^/i with A'g\^A\. We can assume however that A'g j 
give equal overall probabilities to visible variables since, if they did not, they 
would be functionally different, giving S-^I immediately. This being so, we can 
assume that for some final v' we have that partition IIg:= hacs.A'g.v' cannot be 
transformed into partition IIj-.^fracs.A'j.v' via the two steps (i), (ii) in Def. 2. 
That is, we have Us % Uj. 

We will define a distribution D such that the context (— ; C) where C is 

if v^v' then h:e D.h else h:= fl 

can be used to differentiate S from / using elementary testing. 

We dispose of the simple case first: iiv"^v' then fracs.(|5; C|.(w, S)).v" equals 
fracs.(|/; C].('y, (5)).w" since, first, hyper-distributions A',gj^ give equal proba- 
bilities to that v" and, second, the final value h' of h is zero for both S; C and 
/; C in that case. The vulnerability associated with these partitions is therefore 
the same. To establish {S; C] -^ |/; C| for our chosen C, it is thus enough to 
show that the vulnerability of fracs.(|/; C|.(w, (5)).w' is strictly greater than for 
fracs.{lS;C}.{v,S)).v' . Treating H^sj] as NxN matrices, we calculate 

"Bayes Vulnerability of ns;C" 
= "Bayes Vulnerability of IIsxD" "(A) above; definition of C based on D" 
= tr.{RxnsxD) "(10) in (D) above; for some maximising RgTZn" 

= tr. (flxD) "(B) above; for refinement 77 = ^xi7s of 775" 

< tr.{IIjxD) "D was chosen in advance, using the Separating 

Hyperplane Lemma, and does not depend on 77: see below" 

= tr.{lxnixD) "identity" 

< {UR:TZn ' tr.iRxnixD)) "lG7^]v" 
= "Bayes Vulnerability of 71/ x D " "(10) in (D) above" 
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= "Bayes Vulnerability of 77/; C" . "(A) above; definition of C" 

The structure of the argument is basically a reformulation on the 5* side, an 
appeal to the separation property of the "pre-selected" matrix 7), and then a 
complementary un-reformulation on the 7 side. Thus for "see below" we argue 
as follows. 

To prepare D we consider all possible refinements of 77^ together. These 
refinements \R:TZm • R^-IIs^ comprise a convex set oi NxN matrices, (where 
convexity follows from (9) and linearity of matrix multiplication). Since 77/ is 
not a refinement of TJg, we know 77/ is not in that set. If we "flatten out" the 
matrices into vectors of length N'^, say by glueing their rows together, then we 
have a "point" 77/ in Euclidean space that is strictly outside of that convex set 
and by the Separating Hyperplane Lemma [35] there must be a plane with normal 
X that strictly separates that whole set of refinements (including 77 = RxIIs) 
from the single point 77/. The point X too will be a vector of length N"^ and, 
written with matrices, the strict-separation condition is then that 

tr.(77xX'^) < tr.(7?jxX'^) for ah 77 refining TTg 

since the dot-product of two A^^-vectors A, B written as matrices of size NxN 
is just tr.(Ax7?'^). This is precisely what we required above; and so our D is 
made by taking the direction numbers of the separating hyperplane in Euclidean 
iV^-space and turning them back into a matrix, and transposing the result. 

We admit that there is no guarantee that the D constructed as above will 
have one-summing rows. However, we can choose D to have all non-negative 
coefficients because 77/ and all the refinements 77 of 775 have the same weight, 
and thus we can add any constant to all elements of D without affecting its 
separating property; similarly we can scale it by any positive number. Thus we 
can assume wlog that D is non-negative and that all its rows sum to no more 
than 1. To then make each row of D sum to one exactly we can extend it with 
an extra "column zero" whose entries are chosen just for that purpose. We then 
need to guarantee -as a technical detail- that neither the Bayes Vulnerability 
strategy matrix for 775 or 77/ chooses h to be zero. We do that, if necessary, by 
adding a second context program that acts as skip when h^Q: but when h=0 it 
executes a large probabilistic choice over h to distribute the value over enough 
new values —1, —2 ... to make sure none of them individually will have a large 
enough probability to attract a maximising choice. ^^ That gives us 

Theorem 4. Refinement is complete for Bayes Risk If S'27 then C(5);^C(7) 
for some context C. □ 



7.4 Maximal discrimination of the Bayes-Risk elementary order 

In this section only, we write "^b" for the Bayes-Risk based elementary testing 
order (^), and we write "^i" etc. to stand generically for any similar order 
based on one of the four alternative entropy measures set out in §1.2. 



At most A'' new values will be required for such a context. 
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The problem discussed in §1.2 was that one could have A^iB and yet 5^2^ 
for programs A, B and competing elementary orders (^i) and (^2)- Similarly, 
for any of the four (^1) including (^b) itself, it's easy to manufacture examples 
where we have A^iB but there is a context C that reverses the comparison, so 
that C(B)^iC{A). This seems a hopelessly confused situation. 

Luckily it turns out (§G) that refinement (C) is sound not only for (^b) but 
for the other three orders as well and -since (C) is complete for Bayes Risk- 
that gives us 

Theorem 5. Bayes Risk is maximally discriminating With context, Bayes 
Risk is maximally discriminating among the orders of §1.2: that is if (^1) is an 
order derived from one of the entropies of §1.2, then whenever for two programs 
5*, / and all contexts C we have C{S)<bC{I) we also have C{S)<iC{I) for all C. 

Equivalently, if two programs A, B are distinguished by any (^1) from §1.2, 
that is A^iB, then there is a context C such that (^b) in particular distinguishes 
C{A) and C{B), that is such that C(A)^bC(B). 

Proof: The equivalence of the first and second formulations is straightfor- 
ward; ^^ we prove the second, reasoning 

A^iB 
=» AgB "soundness of (C) for (^1), see §G" 

=> C{A)^bC{B) . "completeness of (C) for (r^e); some context C" 

a 

It's the completeness result for (^b) that makes it maximal, i.e. that seems 
to single it out from among the other orders. Whether or not the other orders 
are also complete is an open problem. 



8 Case study: The Three Judges protocol 

The motivation for our case study is to suggest and illustrate techniques for rea- 
soning compositionally from specification to implementation of noninterference 
[27, 23, 11]. Our previous examples include (unboundedly many) Dining Cryptog- 
raphers [6], Oblivious Transfer [32] and Multi-Party Shared Computation [39]. 
All of them however used our qualitative model for compositional noninterference 
[27, 23]; here of course we are using instead a quantitative model. 



^'^ First implies second: 

If Ay^iB then, appealing to the identity context in the conclusion of the first 
fornmlation, for some C we have C{A):^bC(B). 

Second implies first: 

Assume C(S');^iC(J) for some C, whence immediately from the second formu- 
lation we have V{C{S))y<B'V{C{I)) for some context ©(£(■))■ 
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The example is as follows. Three judges A, B,C' are to give a majority deci- 
sion, innocent or guilty, by exchanging messages but concealing their individual 
votes a, b and c, respectively. "^^ 

We describe this protocol with a program fragment, a specification which 
captures exactly the functional and security properties we want. Its variables 
are Boolean, equivalently {0, 1} and, including some notational conventions ex- 
plained below, it evaluates (a-|-b-|-c > 2) atomically, and reveals the value of the 
expression to everyone: 



visyi a; vis^ b; visc c; -s— These are global variables. 

reveal (a+b+c > 2) . ^ Atomically evaluate 

and reveal expression. 



(12) 



Note that this specification is not noninterference-secure in the usual sense: for 
example when a judges "not guilty" (false) and yet the defendant is found guilty 
by majority. Agent A learns that both b, c must have judged "guilty " — and that 
is a release of information. This allows a similar behaviour in the implementation, 
strictly speaking a declassification: but we need no special measures to deal with 
it. 

We interpret the specification as follows. The system comprises four agents: 
the judges A, B, C and (say) some Agent X as an external observer. The par- 
ticipating agents (A, _B, C) are distributed, each with its own state-space; and 
the external observer has no state. The annotations visr^^ ^71 above indicate 
that the variables a, b, c are located with the agents A, B, C respectively and are 
visible only to them: that is, only Agent A can see variable a etc. ^^ 

The reveal command (explained in more detail below) publishes its argu- 
ment for all agents to see. 

The location of a variable has no direct impact on semantics (in our treat- 
ment here); but it does affect our judgement of what is directly executable and 
what is not. In particular, an expression is said to be localised just when all its 
variables are located at the same agent, and only localised expressions can be 
directly executed (by that agent, thus). Thus a+b+c > 2 is not localised, in spite 
of its being meaningful in the sense of having a well defined value; and it is pre- 
cisely because it is not localised that we must develop the specification further. 
Assignment statements a:—E, where a is in Agent A, say, and E is localised in 
Agent B, are implemented by B's calculating E and then sending its value in a 
message to A. 

The visibility of a variable does affect semantics. A variable annotated vis^, 
for example, is treated as if it were simply annotated vis when we are reasoning 



^^ Though this is similar to the (generalised) Dining Cryptographers, it is more difficult: 
we do not reveal anonymously the total number of guilty votes; rather we reveal only 
whether that total is a majority [11, Morgan:09a]. 

^■^ In principle we could have separate annotations for visibility and for location, allow- 
ing thus variables located at A that however A cannot see, and (complementarily) 
variables located at B that A can see. But in this example we do not need that fine 
control, and so we use vis for both. 
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from Agent A's point of view; from Agents' B, C points of view, it is treated as 
if it were annotated hid; and the same applies analogously to the other agents. 
Thus in the example we will treat three agents A, B, C each with her own view: 
variables visible to one (declared vis) will be hidden from another (declared hid) 
— and vice versa. The "extra" Agent X (mentioned above) sees none of a, b, c, 
but does observe the reveal. This simple approach is possible for us because 
we are not dealing with agents whose actions can be influenced by other agents' 
knowledge. 

In principle the vis-subscripting convention means that protocol develop- 
ment, e.g. as in §8.3ff. to come, will require a separate proof for each observer 
(since the patterns of variables' visibility might differ) ; but in practice we can 
usually find a single chain of reasoning each of whose steps is valid for two or 
even all three observers at once. 

Before incrementally developing (12) into an implementation in order to lo- 
calise its expressions, we introduce some further extensions, including the reveal 
statement mentioned above [22], that will be used in the subsequent program 
derivation. 

8.1 Further program-language extensions 

Multiple- and local variables To this point we have had just two variables, 
visible v and hidden h, and a split-state VxDT-L to describe their behaviour. In 
practice each of V, H will each comprise many variables, represented in the usual 
Cartesian way. Thus if we have variables a:^, b:i3,c:C, d:I? with the first two 
a, b visible and the last two c,d hidden, then V is AxB and H is Cxi? so that 
the state-space is AxBxD{Cx'D). Assignments and projections arc handled as 
normal. 

We allow local variables, both visible and hidden, which are treated (also) as 
normal: within the scope of a visible local- variable declaration ||[ vis x: A" • • • ]||, 
the Viocai used is A'xVgiobai- Hidden variables are treated similarly. ^^ 

Revelations Command reveal E publishes expression E for all to see: it is 
equivalent to the local block 

l|[visv; v:=i?]|| , (13) 

but it avoids the small extra complexity of declaring the temporary visible- 
to-all variable v and the having to introduce the scope brackets as (13) does. 
The attraction of this is that the reveal command has a simple algebra of its 
own, including for example that reveal E ~ reveal F just when E and F 
are interdcducible given the values of (other) visible variables [22,24]. Thus for 
example (and slightly more generally) we have 

reveal aVb; reveal bVc = reveal bVc; reveal cVa , 



^^ Implicitly local variables are assumed to be initialised by a uniform choice over 
their finite state space. In our examples however, we always initialize local variables 
explicitly, to avoid confusion. 
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using (V) to denote cxclusivc-or, because from aVb and bVc an observer can 
deduce both bVc and cVa, and vice versa. 

Bulk atomicity In Fig. 1 we introduced the semantics of commands and re- 
marked that for syntactically atomic commands the secure semantics is given 
by Def. 1, based on the classical semantics of the same command. With atomic- 
ity brackets ((•)) we make groups of commands atomic "by fiat," so that Def. 1 
applies to them as well. We have 

Definition 3. Secure semantics atomicity brackets Given any program P we 
define 

l{{P))Uv,S) := emhed.{Qh:6 . lPlc.{v,h)) . (14) 

The effect overall, as earlier, is to impose the largest possible ignorance of h' 
that is consistent with seeing v' and knowing the classical semantics {Pjc of 
the program between brackets. In particular, perfect recall and implicit flow are 
both suppressed by ((•)). D 

A comparison of Defs. 1 and 3 shows immediately that for any syntactically 
atomic command A we have A = {{A)), just as one would expect. ^^ With groups 
of commands of course the equality does not hold in general: for example we 
cannot reason 

v:=h; v:=0 

= ((v:=h)); ((v:=0)) "syntactically atomic, both" 

?= ((v:=h; v:=0)) "invalid step" 

= ((v:= 0)) "classical equality" 

= v:=0 , "syntactically atomic" 

because -as we have often stressed- an assignment of h to v does reveal h to an 
observer, even of v is immediately overwritten. The invalid step violates the con- 
ditions of Lem. 1 immediately below, which gives an important special situation 
in which we do have distribution of atomicity inwards: 

Lemma 1. Distribution of atomicity Given is a sequential composition of two 
programs P;Q. If by observation of the visible variable v before the execution of 
P and after the execution Q it is always possible to determine the value v had 
between P and Q, then we do have 

((P;0)) = ((F));((Q)). 

Proof: (sketch) The full proof is given in §E. 

It can be shown that the left- and right-hand sides' classical effect on v and 
h are the same, and so the only possible difference between the two can be the 

■^'^ Note that although reveal E looks syntactically atomic, it is via (13) actually an 
abbreviation of a compound command: thus in fact ((reveal E)) 7^ reveal E in 
general. Actually ((reveal E)) = skip in all cases, whereas reveal E — skip only 
when E is visible. 
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degree to which h is hidden. On the left, variable h must be maximally hidden 
since that is the (defined) effect of the atomicity brackets ((•)). Thus we need 
only argue that h is maximally hidden on the right as well. 

Since h is maximally hidden after ((P)), the only way h can fail to be max- 
imally hidden after the subsequent {{Q)) is if there are two (or more) distinct 
values of v after P, say ^'{o.i} each with its associated hidden distribution f^io.i} 
of h, that are brought together to the same final value w' by execution of Q. For 
that would mean that after Q we could have two distinct distributions S',^ ^-, 
of h associated with that single v' , which is precisely what it means not to be 
maximally hidden. Each 6',^^ ^^ would have been derived from the corresponding 
(5{o.i} in between. 

That scenario cannot occur if for any particular starting v before P that leads 
to two (or more) values i'{o.i} between P and Q, we never have Q bringing those 
values back together again to a single final value v' . That amounts to being able 
to determine the intermediate value {j of v from its values before (v) and after 
iv'). D 

In fact our invalid step ((v:= h; v:= 0)) 7^ ((v:= h)); ((v:= 0)) above shows off the 
condition exactly. Although v's intermediate value is indeed determined by the 
initial h, that is not good enough because we cannot see that h: we have access 
only to the initial v. And v's final value is always 0, again hiding v's intermediate 
value from us. Knowing v before and after, in this example, does not tell us its 
intermediate value (which is in fact h). 

By definition, semantic equivalence of P and Q in the classical model entails 
semantic equivalence of ((P)) and {{Q)) — that is why within atomicity brackets 
we can use classical equality reasoning. 



8.2 Subprotocols: qualitative vs. quantitative reasoning 

Rather than appeal constantly to the basic semantics (Fig. 1) instead we have 
accumulated, with experience, a repertoire of identities -a program algebra- 
which we use to reason at the source level. Those identities themselves are proved 
directly in the semantics but, after that, they become permanent members of 
the designer's toolkit. One of the most common is the Encryption Lemma. 



The Encryption Lemma Let statement {vWh):~E set Booleans v, h so that 
their exclusive-or vVh equals Boolean E: there are exactly two possible ways 
of doing so. In our earlier work [27], we proved that when the choice is made 
demonically, on a single run nothing is revealed about E; in our refinement style 
we express that as 

skip = ||[ vis v;hid h; (vVh):=S]|| . (15) 

In our current model we can prove that exactly the same identity holds provided 
the choice of possible values for v and h is made uniformly: 
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Lemma 2. The Encryption Lemma For any Boolean expression E we have 
that the following block is equal to skip, and so reveals nothing: 

||[vis v;hidh; (vVh):=S]|| . 

For this we require that the implicit choice in (vVh):=i? is made uniformly. 
Proof: We calculate 



VIS V 

vis V 
vis V 
vis V 
vis V 
vis V 
vis V 



hid h 
hid h 
hid h 
hid h 
hid h 



(vVh):=ii;]|| 

(((vVh):=_E)) ]|| "syntactically atomic" 

((v:==true®false; h:= vVi?)) ]|| "classical equality (f)" 

((v:=trueefalse));((h:=vV£;)) ]|| "Lem. 1" 

v:=true0false; W.— mVE ]|| "syntactically atomic" 

v:=true®false; ||[ hid h; h:=\/VE ]|| ]|| "hid h does not capture" 
v:= true®false ]|| "assignment to local hidden is skip" 

= skip . "value assigned to local v is known already (J)" 

D 

The crucial step in the proof above was the classical equality at (f), and we 
note that other variations are possible: for example we also have the classical 
equality 

(vVh):==£; = h:=true®false; v:=SVh , (16) 

which suggests the operational procedure of "flipping a private coin h " and then 
revealing (via assignment to local v) the exclusive-or of that private coin with 
some expression E. The above reasoning shows that also to be equal to skip. 

Finally, we recall that (©) means choose uniformly, and we now show that 
it is essential for the (f) step and for the equality (16): if for example we had 
h:= true p© false; v:=i?Vh in (16) on the right-hand side, but with p^l/2, it 
would not be possible to rewrite that in the form v:=true 1/2® false; W.= yVE as 
wc had at (f) but with the 1/2 here exposed. Instead we'd have 

\/:=^Ep®E; h:=\/VE 

and the last step (|) would then be invalid if E contained hidden variables (as 
it usually would). The role of p=l/2 is thus that the equality 

\j:—^E 1/2® E — v:= true 1/2® false 

holds no matter what expression E is, and in particular even if it contains hidden 
variables — but only (in general) when the choice is with probability 1/2. 

Lem. 2 means that extant qual itative source-level proofs that rely only on 
"upgradeable identities" like (15) can be used as is for quantitative results pro- 
vided the demonic choices involved are converted to uniform choice. And that is 
the case with our current example. 

Beyond the Encryption Lemma, we use Two-party Conjunction [39] and 
Oblivious Transfer [32] in our implementation. Just as for the Encryption Lemma, 
the algebraic proofs of their implementations [27, 23] apply quantitatively pro- 
vided we interpret the (formerly) demonic choice as uniform. We now look briefly 
at those subprotocols. 



30 Annabelle Mclver, Larissa Meinicke, and Carroll Morgan 

Two-Party Conjunction In the Two-Party Conjunction subprotocol, the con- 
junction of two privately held Booleans is published without revealing either 
Boolean separately. It is an instance of Yao's Multi-party Computation technique 
[39] and we have given a formal derivation of it elsewhere [23] . Its specification 

is 

Two-Party Conjunction 

visb b; viS(7 c; ^— These are global variables. /i '7^ 

reveal b A c , 

and its similarity to (12) is clear: a compound outcome bAc is published without 
revealing the components b, c — except that, just as before, if for example the 
revealed outcome is false but b is true, then B can deduce that c must have been 
false (and similar). 

We develop an implementation of (17) in several steps, as follows. Note that 
for some steps the justification varies depending on the agent although we have 
arranged that the claimed equality is valid for all of them. We have 

(17) = skip; "identity" 

reveal bAc 

= ||[ viss bo, bi; (boVbi):=:b; reveal bo] ||; "Encryption Lemma for A, C; 

reveal bAc obvious for B; see below (J)." 

= |][ vis^ bo, bi; "Revelation algebra; in this context b A c = boVbc f 

(boVbi):=b; reveal bo; where bc:= (bi if c else bo)." 

reveal be 



vis^ bo, bi; "Delegate second revelation to Agent C" 

(boVbi):=b; reveal bo; 
||[visc Co; Co:=bc; reveal Co ]|| 



= |][visB bo,bi; vise Co; "Rearrange declarations; clean up." 

(boVbi):= b; reveal bo; <— This done by Agent B. 

Co:= be; -^ An "Oblivious Transfer" between B,C. 

reveal Co <— This done by Agent C. 

]ll- 

At (I) we find a case where the same equality applies to all agents, although 
in fact the reasons for its validity use agent-specific reasoning. For example, for 
Agents A, C the fragment is effectively 

||[liidbo,bi; (boVbi):=b; reveal bo] j] , 

which is a version of the Encryption Lemma in which bo, being revealed, takes 
the role of the local visible variable. Variable bi is the local hidden, and (hidden) 
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variable b is the expression E (on which there are no restrictions). On the other 
hand, for Agent B the fragment is 

||[visbo,bi; (boVbi):=b; reveal bo] || 

with b a global visible: this is trivially equal to skip because all variables are 
visible. 

At (f) we use the revelation algebra mentioned in §8.1 to reason that once 
bo is revealed, going on to reveal b A c is equivalent to revealing just be since 
-knowing bg- we can calculate each of b A c and b^ from the other. ^^ 

More interesting than any of that, however, is that in the last step we ap- 
peal to a further subprotocol by including the specification of the "Oblivious 
Transfer Protocol" [31, 32]. Here Agent C has a private {0, l}-valued variable c 
and obtains from Agent B either bo or bi, depending on c. But Agent B does 
not discover what c is, and Agent C does not discover b^c- We give a rigorous 
implementation of the protocol elsewhere [27]; an informal explanation may be 
found in §F. 

Finally, to emphasise our earlier point about declassification, we suppose b is 
true but c is false and thus that B learns c by noting that false is revealed overall; 
note that this is a property of the specification. Now, in the implementation, 
we can see how this happens: when b is true the local variables bo.i will be 
complementary and so -in spite of not learning c while the Oblivious Transfer is 
carried out- Agent B will still learn c afterwards by comparing Cq with her own 
bo,i- 



8.3 The Three-Judges implementation: first attempt 

We begin with an implementation attempt that fails, because this will illustrate 
two things. The first is that our model prevents incorrect developments, that is it 
stops us from constructing implementations less secure than their specifications: 
arguably this "negative" aspect of a method is its most important property, since 
it would be trivial to describe a method that allowed secure refinements. . . and 
all others as well. The key is what is not allowed. 

The second thing illustrated here is that a conditional if E ■ ■ ■ fi should be 
considered to reveal its condition E implicitly. This implicit flow is a property 
forced upon us by our advocacy of program algebra and our use of composition- 
ality: since the then- and the else branch of a conditional can be developed 
differently after the conditional has been introduced, we must expect that those 
differences might reveal to an attacker which branch is being executed (and hence 
the condition implicitly). This is exactly what we are about to see. 

^^ We have 

boVbc = (boVbi if c else boVbo) = (b if c else false) = bAc. 



(18) 
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We start with some Boolean algebra 
(a+b+c>2) = aA(bVc)VbAc = (b V c if a else b A c) 
and that suggests the first development steps 
reveal (a+b+c > 2) 

= "P = if S then P else P fi" 

if a then reveal (a+b+c > 2) 
else reveal (a+b+c > 2) 
fi 

= if a "After then we can assume a; 

then reveal b V c after else we can assume ^a." 

else reveal b A c 
fi... 



Now we can deal immediately with the else-part by adapting the Two-Party 
Conjunction Protocol of §8.2 so that it reveals bAc only to Agent A; we introduce 
a pair of A-private local variables for that purpose. The result is 

.= ||[visA aB,ac; viss bo,bi; vise Co; "Adapting §8.2" 

if a 

then reveal b V c 

else (boVbi):= b; <— Done privately by Agent B. 

aB:=bo; <— Message B^fA. 

Co:= be; <— Oblivious Transfer B^fC. 

ac:=Co; <— Message C— >A. 

reveal a^Vac <— Agent A announces majority verdict. 
fi 



For the then-part we write b V c as ^(^b A ^c) and adapt the else-part 
accordingly; the effect overall turns out to be replacing the initial b by -ib and 
changing the following assignment. Once we factor out the common portion of 
the conditional, we have 

.= ||[vis^ 3^,3(7; viss bo,bi; vise Co; "Using de Morgan" 

if a then (boVbi):= ^b; 3^:= bi else (boVbi):= b; 3^:= bo fi; 
Co:=bc; ac:=Co; 
reveal asVac" 
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Now we see that the problem with going further is that Agent A must some- 
how arrange that B carries out either (boVbi):=^b or (boVbi):=b, with that 
arrangement depending on the value of A's private variable a. Since _B's two 
potential computations are different, there is no way this can occur without -B's 
learning the value of a in the process: this code is already incorrect. 

Thus we must abandon this attempt, and admit that the questionable step 
at (18) above was indeed wrong. In order to allow us to develop distributed 
implementations, we make the (reasonable) assumption that each agent knows 
the code it is instructed to execute, with those instructions coming possibly from 
another agent. In this case Agent B must execute either (boVbi):= ^b; aB'-— bi 
or (boVbi):=: b; a^i^ bo, depending on the value of a which is supposed to be 
visible only to Agent A. 

Our semantics recognises implicit flow, and does not allow in general the 
transformation of P into if E then P else P fi, for exactly this reason. ^^ 
Similarly, a fragment a:=b;a:=c represents two messages, one B^A and then 
a second one C^A; with perfect recall we recognise that A can learn b by 
examining a after the first message has arrived, but before the second. 

8.4 The Three-Judges implementation: second attempt (sketch) 

An "obvious" remedy for §8.3's problem, that Agent B's is aware of which pro- 
cedure she must follow, is to make B follow both procedures, speculatively: she 
does not know which one A will actually use. 

The difficulty is now with Agent A, who learns both bAc and bVc. Although 
those two values do not (always) determine b and c themselves, they do provide 
strictly more information to A than her knowing a and (a+b+c > 2) would have 
provided on their own. ^^ Thus this approach fails also. 

Our attention is therefore drawn to arranging for B (and C) to do both two- 
party calculations, but then for A to get the results of only one of them. That 
leads naturally to the approach of the next section, a combination of two two- 
party computations (letting Agents _B, C do both calculations) and two (more) 
oblivious transfers (letting Agent A learning about only one of them.) ^^ 

8.5 The Three-Judges implementation: successful development 

To repair the problem we encountered above we must arrange that Agents B, C 
as far as possible carry out procedures independent of A's variable a, in particular 



In our related work for noninterference with demonic choice and without probability 
[26,27], we give further arguments for this point of view, but based directly on 
program algebra. The extra feature there is that even classical programs have a non- 
trivial refinement relation; here, we have proper refinement only for secure programs. 
If a and (a+b+c > 2) are both false, then Agent A concludes ^(bAc), for which 
there are the three possibilities false/false, true/false, false/true. Agent ^'s addition- 
ally knowing b V c would eliminate at least one of those three. 

The "more" refers to the fact that the two-party computations have oblivious trans- 
fers inside of them. 
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so that calculations relating to bVc and to bAc both occur, irrespective of which 
result A actually needs. 

To achieve this we need a slightly more general form of two-party computa- 
tion. We begin by introducing the specification of such a two-party conjunction, 
with its variables made local so that the introduced code is equivalent to skip: 

reveal (a+b+c > 2) 

= ||[ viss bo; viS(7 Cq; (boVco):= b A c; ]||; "Two-party conjunction" 

reveal (a+b+c > 2) 

From Agent A's point of view, the introduced statement is trivially equivalent 
to skip: all assignments are to local variables that A cannot see. From Agent 
B, C"s points of view, it is equivalent to skip because it is an instance of the 
Encryption Lemma: each of those two agents can see only one of the two variables 
assigned-to, and so learns nothing about the expression b A c. ^^ 

The statement (boVco):= b A c we have introduced is a more general form 
of two-party conjunction than the reveal bAc we illustrated earlier in §8.2 — 
that is because the conjunction is not actually revealed, not yet; instead it is 
split into two "shares," one belonging to each party B, C. Since each party has 
only one share, the conjunction is not revealed at all at this stage. But those 
shares can be used as inputs to further two-party computations, while preserving 
the security, and the contribution of the conjunction to a larger computation is 
revealed at a later point. 

The extra generality introduced by the shares does not cause us extra work 
here, since we are using only the specification for our reasoning and that (we will 
see) suffices. When we come to implement the general two-party conjunction in 
more primitive terms, however, we would then have further work to do. We have 
given such an implementation elsewhere [23]. 

With exactly the same reasoning as above we can introduce two-party dis- 
junction and, with both conjunction and disjunction present, perform some re- 
organisation: 



^^ The following bogus counter-argument is an example of what having a careful defi- 
nition of equality and refinement helps us to avoid. 

"Agent B might know that b is false, and then perhaps receive false also in bA. 
She concludes that Ca is also false, which is a leak since Ca is supposed to be private 
to C , invisible to B." 

In fact this is not a leak, because to judge it so we must refer to the specification 
of this fragment. But that is simply skip and there is no Ca declared there: the 
revealed variable is local to the implementation only. 

That is, publishing the value of a hidden variable declared only in the im- 
plementation might looli like a leak in the conventional interpretation -consider 
||[ vise c; • • • ; reveal c ]|| for example- but it is actually a leak only if that variable 
c has come to contain information (via assignments say in the "• • • " portion) about 
other, more global hidden variables that were present in the specification, originally. 
Our semantics checks for that automatically. 
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||[ viss bo; vise Co; (boVco):= b A c; ]||; "Two-party disjunction" 

iii visB bi;visc ci; (biVci):= b V c; ]||; 
reveal (a+b+c > 2) 



'visB bo, bi; vise Co,Ci; 

(boVco):=bAc; 
(biVci):=bVc; 
reveal (a+b+c > 2) 



"Reorganise declarations and scoping" 



;visB bo, bi; vise Co,Ci; 

(boVco):=bAc; 
(biVci):=bVc; 
reveal baVc^ 



"Boolean algebra" 



Now since b^Vca is revealed to everyone, and thus to A in particular, it does 
no harm first to capture that value in variables of A, and then to have Agent A 
reveal those instead: 



VISA a_B,ac; 
visB bo,bi; 
vise Co,Ci; 

(boVco):=bAc; 
(biVci):=bVc; 
(aBVac):=baVca; 
reveal a^Vac 



"Introduce local private variables of A" 



The point of using two variables ar^ ^i rather than one is to be able to split 
the transmission of information B, C^^A into two separate oblivious transfers 
B^A and C^A. 

Thus the protocol boils down to three two-party computations: a conjunction 
bAc, a disjunction bVc and an exclusive-or baVCa- The rhs of the last is actually 
within an atomic conditional on a, that is (boVco) ifa=Oelse (biVci). 



8.6 Two-party exclusive-or 

Our final step is to split the two-party exclusive-or into two separate assignments. 
This is achieved by introducing a local shared variable h that is visible to B,C 
only, i.e. not to A, and encrypting both hidden variables with it. Thus we take 
the step 



(asVac 



- baVCa 
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= Il[visi3,c h; 

h:=true false; 

aij:=baVh; 

ac:=CaVh 



justified trivially for B, C since the only assignments of non-constants are to 
variables visible only to A. For A the justification comes from the the use of 
classical equality reasoning within a temporary atomicity block (refer Lem. 1): 
the effect of the two fragments above on a_B,ac is identical, and there are no 
overwritten visible values. 

We will now show that in fact the extra variable h is not necessary: by 
absorbing it into earlier statements, and with some rearrangement of scopes we 
can rewrite our code at the end of §8.5 as 

. = ||[visA as, 3c; viss bo,bi; vise Co,Ci; vis^^c h; 

h:=true false; 
(boVco):=bAc; 

(biVci):=bVc; 

asi^baVh; 
ac:=CaVh; 
reveal asVac" 



where in fact we have moved the declaration and initialisation of h right to 
the beginning. We now absorb it into the earlier two-party computations by 
introducing temporarily variables b'rp ^-, and c'r^ ^-, which correspond to their 
unprimed versions except that they, too, are encrypted with h. That gives 

.— ||[ vis^ as, a^; "Boolean reasoning" 

visB bo,bi,b[,,b'i; 
vise Co,Ci,Cq,c'j; 
visB^c h; 

h:=true false; 

(boVco):=bAc; b^, c;,:= boVh,CoVh; 

(biVci):= b V c; b^, 0'^:= biVh, CiVh; 

as'-^ b^; <(— Note primes, justified by earlier. 

ac'-—c'^', ^— Assignments to b'^Q j^j, c'^q jj. 

reveal asVac ]|| • . • 

where we have replaced the baVh and CaVh at the end of the code with their 
simpler, primed versions where the encryption in built-in. Now we can rearrange 
the statements using h so that not only h but also the unprimed b{o ij and C{o,i} 
become auxiliary; that is, we have for the conjunction 

(boVco):-bAc; b[„c[,:= boVh, CoVh 
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||[visA as,ac; viss bo,bi; vise co,Ci; 
bo := true © false; 
bi:=true © false; 



co:= (bVbo if c else bo); 

ci:= (^bi if c else bVbi) 

aB'-~ (bi if a else bo); 

ac:= (ci if a else co); 



> Four oblivious transfers. 



reveal asVac ]|| 
We replace the two Two-Party 'junctions by their implementations as oblivious trans- 
fers: each becomes two statements instead of one. The random flipping of bits b{o,i} is 
then collected at the start. 

The preservation of correctness is guaranteed by the compositionality of the security 
semantics. 

Fig. 2. Three- Judges Protocol assuming Oblivious Transfers as primitives 



= (( (boVco):= b A c; bf,, Cq:= boVh,CoVh )) "introduce atomicity" 

= (( (boVco):= b A c; bo, Co:= bQVh,CQVh )) "classical equality" 

= (bQVcQ):= b A c; bo,Co:= bgVh, CqVH , "remove atomicity" 

and similarly for the disjunction. Removing the auxiliaries, and then applying a 
trivial renaming to get rid of the primes, we end up vi^ith 

. = ||[viSyii 3^,3(7; vis^ bo,bi; vise Co,Ci; "Consolidating the above" 

(boVco):= b A c; -s— Two-Party Conjunction (contains Oblivious Transfer). 

(biVci):= b V c; -s— Two-Party Disjunction (contains Oblivious Transfer), 

a^i^ ba', <— Oblivious Transfer, 

ap:— Cq; -s— Oblivious Transfer. 

reveal asVac ]|| 

which is precisely what we sought. 

In Fig. 2 we give the code with the (two) two-party computations instanti- 
ated. In Fig. 3 we instantiate one of the (four) oblivious transfers. 

9 Conclusion: a challenge and an open problem 

We have investigated the foundations for probabilistic non-interference security 
by proposing a semantics, and a refinement order between its programs, which 
we have demonstrated has connections with existing entropy-based measures. 
Especially it is related to Bayes Risk and we have given a soundness and com- 
pleteness result that establishes compositional closure. 

Our approach has a general goal: to justify practical methods which support 
accurate analysis of programs operating in a context of probabilistic uncertainty. 
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Starting from Fig. 2, we replace the specification of the first of its four oblivious trans- 
fers co:= (bVbo if c else bo) by an implementation in elementary terms [23]: 

||[visA aB,ac; viss bo,bi; vise C(,,Ci; 
bo:=true © false; 
bi:=true © false; 

||[visB m,'), mi; vise c', m'; 
c':=true © false; 

mo:= true © false; m'l := true © false; 
m':= m^/; <— Done in advance by trusted third party, f 

-visABC X, yo,yi; •*— Note these are visible to all three agents. 

x:=cVc'; 

yo:=boVmi; 

yi— bVboVm'_,,; 

co:=ycVm' <— Although y^ is public, only Agent C knows m'. J 



ci:= (^bi if c else bVbi); ^ Three more oblivious transfers, 
a_B:= (bi if a else bo); > each one to be 

ac-~ (ci if a else Co); J expanded as above. 

reveal asVac ]|| 

Each of the other three transfers would expand to a similar block of code, making 
about 40 lines of code in all. The Oblivious Transfer is formally derived elsewhere [27]; 
an informal explanation is given in §F. 

The preservation of correctness, under expansion, is again guaranteed by the composi- 
tionality of the security semantics. 

Note that aside from the statement marked | (and its three other instances within the 
three other, unexpanded oblivious transfers), all messages are wholly public because of 
the declarations vis x, yo,yi; that is, all the privacy needed is provided already by the 
exclusive-or'ing with hidden Booleans, as J shows. 

The only private communications (f x 4) are done with the aid of a trusted third party. 
As explained by Rivest [32, 27] this party's involvement occurs only before the protocol 
begins, and it is trusted not to observe any data exchanged subsequently between the 
agents; alternatively, the subsequent transfers can themselves be encrypted without 
affecting the protocol's correctness. (A trusted third party without these limitations 
would implement trivially any protocol of this kind, simply by collecting the secret 
data, processing it and then distributing the result.) 

Fig. 3. Three- Judges Protocol in elementary terms 



Abstraction underlies tractable analysis, but the results of such analyses become 
relevant only if the method of abstraction aptly preserves the properties intended 
for examination. The impact of this research is to show firstly that our refinement 
order aptly characterises Bayes Risk, and secondly that the former discrepancies 
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between Bayes Risk and other information orders can be rationalised by taking 
contexts into account. 

By taking a fresh point of view, we have related entropies that were formerly 
thought to be inconsistent. Furthermore, we highlight the similarities between 
non-interference (as defence against an adversary) and large-scale structuring 
techniques (such as stepwise refinement and its associated information hiding 
[29]) for probabilistic systems. Both require a careful distinction between what 
data can be observed and what data must be protected; by observing that dis- 
tinction in the definition of abstraction, wc allow the tractable analysis of prop- 
erties which rely on "secrecy" (on the one hand) or "probabilistic local state" 
(on the other). This unified semantic foundation opens up the possibility for a 
uniform approach to the specification of security properties, along with other 
safety-critical features, during system design [20]. 

These positive results now present a challenge and an open problem. The 
challen ge is to find a model where all three features, probability, nondetermin- 
ism and hidden state, can reside together, and then an equivalence between 
semantic objects which respects an appropriate definition of testing. The pres- 
ence of nondeterminism would then include a treatment of distributed systems 
with schedulers having a restricted view of the state [4] ; that is because nondeter- 
minism can be interpreted either as underspecification, or as a range of decisions 
presented to a scheduler. Within such a model we would be increasing the power 
of the adversary to harvest information about the hidden state by increasing the 
expressivity of the contexts she can create. It is an open p roblem whether that 
increased power is sufficient to make the various information-theoretic orders 
(Bayes Risk, Shannon Entropy, Marginal Guesswork etc.) equivalent or whether 
they remain truly distinct. 



Related techniques 

The use of information orders, such as those summarised in §1.2, to determine 
the extent to which programs leak their secrets is widespread. Early work that 
took this approach includes [25, 38, 13], and more recently it has been employed 
in [34, 15, 1, 7, 18]. One of the contributions of this paper is to show how those 
evaluations can be related by taking a refinement-oriented perspective. Composi- 
tionality plays a major role in our definition of refinement and we note that other 
orders between probability distributions such as the "peakedness" introduced by 
Dubois and Hiillermeir [9] appear not to be compositional when generalised to 
our hyperdistributions. 

More significant than the particular information order is the way that it is 
used in the analysis of programs. Our approach uses specifications to charac- 
terise permitted leaks, and a refinement order which ensures that for our chosen 
information order (i.e. Bayes Risk), the implementation is at least as secure as 
its specification. An alternative mode is taken by Braun et al. [1]. Rather than 
restricting the elementary testing-relation {:<) to a compositional subset (C), 
they identify the safe contexts Csafe such that / "^" Csafe(-^)- With our emphasis 



40 Annabelle Mclver, Larissa Meinicke, and Carroll Morgan 

on implementations / and their specifications 5*, by analogy we would be looking 
for S^I implies Csafo(5')^Csafc(-^)- 

Building on the theoretical approaches, others have investigated the use of 
automation to evaluate the quantitative weaknesses in programs. Heusser and 
Malacaria [12], for example, have automated a technique based on Shannon 
entropy. Andres et al. [17] similarly consider efficient calculation of information 
leakage, which can provide diagnostic feedback to the designer. 

In some ways our semantics is related in structure to Hidden Markov Models 
[14] suggesting that, in the future, the algorithmic methods developed in that 
field might apply to the special concerns of program development. A Hidden 
Markov Model considers a system partitioned into hidden states (our h) and 
observable outputs (similar to our v). The h-state evolves according to a Markov 
Chain, in our terms repeated execution of a fragment h:G_D.h in which the 
probability of the next state h' is given by a fixed "matrix" D as D.h.h' where 
h is the current state. Associated with each transition is an observation, in our 
terms execution of a fragment v:e E.h. Put together, therefore, the HMM evolves 
according to repeated executions of the fragment 

h:eL>.h; v.eE.h , (19) 

which fragment is a special case of our probabilistic-choice statements since the 

distributions on the right in (19) do not depend on v, whereas in Fig. 1 they can. 

The canonical problems associated with HMM^s are (in the terms above) 

1. Given the source code (that is, the matrices D,E), compute the probability 
of observing a given sequence of values assigned to v. 

2. Given a sequence of output values, determine the most likely values of D, E. 

3. Given the source code and a particular sequence of values assigned to v, 
calculate the sequence of values assigned to h that was most likely to have 
occurred. 

The first of those is basically the classical semantics [16,21], but projected 
onto V since we arc not interested in h's values. The second we do not treat at 
all — it is tantamount to trying to guess a program's source code (in a limited 
repertoire) given the outputs it produces. The third is closest to our security 
concerns, since it is in a sense trying to guess h from observation of v. 

But in fact we address none of the three problems directly, since even in 
the third case we have a different concern: in HMM terms we are comparing 
two systems Ds,Es and Di^Ei, asking whether -according to certain entropy 
measures- the entropy of the a-posteriori distribution of the final value of h is 
at least as secure in system Di,Ei as it is in Ds,Es- Furthermore, our concern 
with compositionality would in HMM terms relate to the question of embedding 
each of Ug, Eg and £>/, Ei "inside" another system £), E. 

The application of HMM techniques to our work would in the first instance 
probably be in the efficient calculation of whether Dg, Es, the specification, was 
secure enough for our purposes: once that was done, the refinement relation 
would ensure that the implementation Di,Ei was also secure enough, without 
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requiring a second calculation. The advantage of this is that the first calculation, 
over a smaller and more abstract system, is likely to be much simpler than the 
second would have been. 
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A Proofs for partition-based matrix representations 

We give here the proofs for properties we relied on in §7. 



Property 9 (in §7.1): Convex closure of refinement matrices To show 
that the set oi NxN refinement matrices is the convex closure of the transpose 
of the set oi NxN strategy matrices, i.e. that 



Un = cc\.{M 



N 



we first observe (3) that every element in ccl.(AlAr) is trivially non-negative and 
column-one-summing (that is, it is a refinement matrix). It remains to show (C) 
that any refinement matrix R in TZpf can be expressed as an interpolation of 
matrices from Mn- 

We argue as follows. Fix R, and identify a non-zero minimum element in 
each of its columns; let c be the minimum of those column-minima; select the M 
in Aljv that has I's in the column-minima positions exactly; and subtract cM 
from R to give some i?'. 

Now R' has at least one more entry than R did, and yet the columns of 
R' still have equal sums, now 1— c. Continue this process from i?' onwards: it 
must stop, since the number of O's increases each time; and when it does stop 
it must be because there is an all-0 column, in which case all columns must be 
all-0, since the column sums have remained the same all the way through. 

The collection of M's and their associated c's is the interpolation we had to 
find: for example, in three steps the procedure generates the interpolation 



Property (11) (in §7.1): refinement matrices form a monoid Since ma- 
trix multiplication is associative and the identity 1^ is an element of TZn, we 
need only demonstrate that TZn is closed under multiplication. That can be 
checked by direct calculation. 



B Secure semantics via matrices 

In §7 we appealed to matrix representations of partitions to construct our proof 
that (C) is the compositional closure of {:<). Here we we project the rest of 
our semantics into matrix algebra, giving matrix representations of split-states, 
hyper-distributions, programs, and refinement. These representations are used to 
verify both monotonicity (Thm. 2 from §6.5) in §C.2 and the Atomicity Lemma 
(Lem. 1 from §8.1) in §E. 
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B.l Notation 

For i taken from some ordered index set / we will write (=|=i: I \ R ' Mi) for the 
vertical concatenation of matrices M, for those i satisfying R, taken in /-order: 
for this to be well defined, the column-count must be the same for all M's; but 
their row-counts may differ. In the same way we will write (-H-i: / | i? • Mi) for 
horizontal matrix concatenation (in which case the row-counts must agree). 

For a given dimension N and expression E.r\ we write \E.r\ for the NxN 
diagonal matrix whose value at the element (doubly) indexed n is E.n. Thus for 
example we have 1 = \l\. 

B.2 Split-states as single-column matrices 

Let N be the cardinality oi VxH. A split-state has type VxD'H and can be 
written as an IxN matrix, a row of probabilities in some agreed-upon index 
order of VxH where the element at (column) index {v,h) gives the probability 
S.h associated with that pair. 

Naturally the row sums to 1 but -more than that- each such representation 
of a split-state will have nonzero entries only in columns whose first index- 
component is the V appearing in (v, S). We say that such a row is V-unique and 
that it has characteristic v. 

Write ly for the NxN diagonal matrix \l if \/=v else 0\ having ones only at 
positions whose row- (or equivalently column-) index has that v as its first com- 
ponent; elsewhere in the diagonal (and everywhere off the diagonal) the entries 
are zero. The row-matrix representation ([v,S)) of split-state {v,d) then satisfies 
((w, S)) — {{v, 6)) X ly because it has characteristic v, so that the multiplication by 
1„ sets to zero only elements that were zero already. 

B.3 Hyper-distributions as matrices 

In §7.1 we interpreted whole partitions as matrices, with each row (fraction) 
giving a possible distribution over "H for some fixed v. Here we proceed similarly, 
but we do not fix v, so that a hyper-distribution A whose support has cardinality 
F is represented as an FxN matrix ([A)) each of whose rows is V-unique, as 
above, thus independently representing some split-state {v,S). Extending the 
matrix representation of individual split-states, we can represent whole hyper- 
distributions according to 

{(A)) := (Mv,S)■.\A^.A.{v,S)*(iv,S))) (20) 

where, as in §7.1, with the multiplier A.{v,S) we are scaling the rows so that 
the total weight of each gives the probability of that split-state in the hyper- 
distribution overall; the distribution the split-state actually contains (the S in 
the {v, S) that the row represents) is as usual recoverable by normalising. Because 
each of the rows is V-unique we say that the matrix as a whole, also, is V-unique; 
but note that it is possible to have several rows with the same characteristic v. 
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V-uniqueness means that no two distinct w's appear with non-zero probability 
in the same row. 

As for partitions, in such matrices we define similarity between rows and 
say that a hyper-distribution is in reduced matrix representation if all its similar 
rows have been added together, and all its all-zero rows have been removed. We 
say that two hyper-distribution matrices are sim,ilar («) if their reductions are 
equivalent up to a reordering of rows. Similarity is a congruence for matrix mul- 
tiplication on the right, but not on the left; vertical concatenation (=|=) respects 
similarity on both sides. 

While the column-order of ((Z\)) is fixed by our (arbitrary) ordering of VxH, 
the row-order might vary since there is no intrinsic order on fractions. We there- 
fore regard ((Z\)) as determined only up to similarity, and our reasoning below 
will be restricted to operations for which similarity is a congruence. In partic- 
ular we have that ((Z\i))w((Z\2)) implies Ai=A2, i.e. that ((•)) is injective up to 
similarity. 

The operation ((•)) on (sub-)hyper-distributions is linear in the sense that 

Up* A)) = p*{{A)) 

and {{A,+A,)) « {{A,)) ^ {{A,)) . ^^') 

B.4 Classical commands as matrices 

We recall from §2.3 that the classical "relational" semantics |P]c of a program 
P is a function VxH -^ D(Vx'H) and may hence be treated (just as D.v.h from 
§7.1 was) as an NxN matrix written (|PD whose value in row (u, h) and column 
{v',h') is inst lPlc.{v,h).{v',h'). ^'^ 

Sequential composition between classical commands is then represented by 
matrix multiplication, in the usual Markov style, so that we have 

(iPi;P2\) = (iPil)x(iP2l). (22) 

B.5 Secure commands as matrices 

We will establish that for any secure program P there is an /-indexed set of 
NxN matrices such that 

{{IPU^,S))) « (+z:/.((«,<5))xM,) (23) 

for any split-state {v,S). We think of the matrices as giving a norm,al form for 
P. Using the normal form, we will be able to represent the lifting of P's secure 
semantics using matrix operations, since then 

{iiQ{v,S):A.lPUv,6)))) « (+z:/.((Z\))xM,) (24) 

can be established by the calculation 



^^ Note that operation (|-D applies to texts, i.e. syntax but ((•)) applies to hyper- 
distributions, i.e. semantics. 
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{iiQ{v,6y.A.lPUv,S)))) 
= (( (E(«> ^)- r^l • ^•(": S) * {Pl-i'"^ S) ) )) "definition expected value §2.1" 

« i^iv,S): \A] . A.iv.S) * ((lPl.(z;,5)))) "from (21)" 

pa (iti(i;, S): \A] • A.{v, S) * (^i: / • {{v, S)) X M,) ) "normal form (23)" 

= (=|=(w, (5): [Zi] ; i: / • iA.{v, S) * ((w, (5))) x Mj) "distribute multiplication" 

Rd (=j=i:/« (=|=(w, (5): [Z\] • Z\.(i;, (5) * ((w, (5))) X M^) "rearrange rows; distribute 

post-multiplication" 

« (=^«:/ • ((Z\)) X Mj) . "from (20) defining {{A))" 

We now show by structural induction how embedded classical commands, general 
choice, sequential composition (and hence all of our secure commands) can be 
translated into this normal form. 



Embedded classical commands In Def. 3 from §8.1 we gave the semantics 
|((P))] of a program P considered as an atomic unit; we now do the same here 
in matrix style. 

If we were to execute an atomic program ((P)) from a split-state {v,S), in the 
matrix style we would begin by calculating {{v,S))x(lP\), giving again a single 
row; but that row might not be V-uniquc, in which case a further step would 
be needed. We'd split its possibly non-unique rows into (maximally) V-unique 
portions, an operation that corresponds roughly to the embed funtion used in 
Def. 1. 

Given a row matrix R that is VxH-indexed by column (such as the out- 
put ((u, (5))x (P[) from just above) the splitting of its possibly non-unique row is 
achieved via 

embed.P := {^v':V ■ R x U-) , (25) 

in which each of the values v' in V is used, in turn, to construct a row ma- 
trix of characteristic v' projected from R by zeroing all other entries: those 
characteristic-w' projections are then stacked on top of each other with (:^) to 
make a single (possibly quite tall!) matrix that is derived from R but now is 
V-unique. "^^ With that apparatus, we have 

((I((P)).(«,5)])) « (+^;':V•((^;,J))x(PpxV) , (26) 

thereby giving the V-unique matrix representation (up to similarity) of the 
hyper-distribution output by ((P)) if executed from incoming split-state {v,S). ^^ 

General choice For both general choice and sequential composition we assume 
inductively that the semantics of subprograms Pi and P2 can be written in 

^^ For example, if the row R is V-unique already then _R':= embed. _R will stack up a 
great many all-zero rows. But still we will have R~R' , so no damage is done. 

^^ Note the algebra of similarity here: if we have R^{{v,S)) for some R, then 
(ijzw': V • -R X l\P\) X 1„/) is similar to the right-hand side above. 
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matrix normal form so that for each split-state (v, S) we have 

i{lP4.iv,S))) « (^j,:J, .((«,<5))xM,,,J . 

To show that general choice can be expressed in matrix normal form, we use 
the following identity which expresses the conditioning of a split-state {v,d) by 
expression E.v.h in terms of matrix operations: 

{Qh:d • E.v.h) * {{v, fh: 6 \ E.v.hJ)) = {{v, S)) x \E.v.U\ . (27) 

We then have 

((lPlg.v.h®P2].(«,<5))) 

= "general choice from Fig. 1; p:= {Qh:5' q.v.h)" 

iip*lPiUv,ih:S\ q.v.h}) + (1-p) * lP2Uv,ih:S\ l-q.v.h}))) 

pa "from (21)" 

p*i{lPiUv,ih:S\ q.v.h}))) + {l-p)*iilP2Uv,ih:S\ l-q.v.h}))) 

« "inductive assumption: matrix normal form of Pi and P2" 

p* {^,n:Ji'(iv,ih:S\q.v.h}))xM,.,,) 
+ (1-p) * (+.?2: J2 • {{v, fh: S I l-q.v.h})) X M^,,,) 

= "distribute scalar multiplications" 

(+ji: Ji . {p*{iv,ih:5\ q.v.h}))) X Mi,, J 
+ {^32--J2 • ((1-p) * {ivAh:5\ l-q-v.h}))) x M^,,,) 

= "recall p:= (0 ft: 5 • q.v.h); from (27)" 

(+,n:Ji.((i;,<5))xWv.h\\xMi,,J 
+ (+J2: J2 • {{v, S)) X \\l-g.v.h\\ X M2,, J 

= "Let pi.v.h:= g.v.h and p2-v.h:= 1— g.v.h" 

(^z:{l,2};j: J, • {iv,S)) x V,.v.h\\ x M,,,) . 



Sequential composition For sequential composition of Pi and P2 we have 

i{iP,;P2Uv,S))) 

= (( (0 {v', S'): lPi].{v, S) • lP2J.{v', S'j) )) "Composition from Fig. 1" 

pa (+J2: J2 ' ((|Pi].(w, S))) X M2J2) "(24); matrix normal form of P2" 

pa i^ji'. Ji',J2- J2 ' {{VjS)) X Mij^ X M2J2) ■ "matrix normal form of Pi" 



B.6 Refinement as matrix multiplication 

In §7.1 we showed how refinement between partitions could be defined using 
matrix multiplication. We can promote this to hyper-distributions by dealing 
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with each v separately: we have that hyper-distribution As is refined by Aj just 
when for each wgV there exists a refinement matrix (i.e. a non-negative, column 
one-summing matrix) Ry such that 

R, X {{As)) X 1, « {{Ai)) X 1, . (28) 

The effect of requiring similarity for each v separately is to prevent rows with 
differing u's from being added together. 



C Proofs for the refinement relation 

C.l Secure programs are partially ordered by (C) 

We show (Thm. 1 in §6.5) that the refinement relation (C) defines a partial order 
over hyper-distributions. It follows by extension that it is a partial order over 
secure programs. 

Reflexivity For any hyper-distribution A rcfiexivity holds trivially since, for 
each V, the intermediate partition fracs.Z\.f is both similar to and as fine as itself. 

Transitivity Assume that A1I—A2 and Z\2CZ\3. It is enough to show that for 
each V we have fracs.Z\i.v C fracs.Z\3.w. For each i let 77^ be the NxN matrix 
representation (§7.1) of fracs.Z\i.w for some N. To prove Ui^zn^, we need to find 
a refinement matrix R31 such that 7T3 is RsixIIi. 

From above there are refinement matrices i?32,-R2i with U^ — i?32x7T2 and 
7X2 = R21XII1. Thus i?3i defined i?32xi?2i satisfies U^ ~ R^ixUi, and it is a 
refinement matrix by Property (11) from §7.1. 

Antisymmetry Assume that both Ai^—A^ and A2^A\ but A\ ^ A^- From 
the first and third assumptions, with Lem. 6 (§G.l) we have that the Shannon 
Entropy of A\ is strictly less than that of A-i; from the second and third, we 
have the opposite — thus a contradiction. 

C.2 Monotonicity of secure programs w.r.t. (C) 

We use the following technical results to verify that (C) is monotonic with respect 
to secure program contexts (Thm. 2 from §6.5). They are verified using the 
matrix algebra from §B above. 

Lemma 3. For any indexed set of matrices \i: I • Mi] each of dimension 
FxN and corresponding refinement matrices Ri each having Fi rows and F 
columns, there exists a refinement matrix R such that 

{^i: I • R, X M,) = R X {^i: I • Mi) . (29) 
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Proof: Refinement matrix R can be given directly as 

{^i:I • (4f i':/ • ( Ri iii = i' else Of,xf ))) • 

That i? is a refinement matrix (i.e. it has non-negative entries and is column-one- 
summing) follows from its definition and the fact that each Ri is a refinement 
matrix. It can be established by matrix multiplication that (29) holds. '^^ D 

Lemma 4. Additive monotonicity of hyper- distributions For probability p: [0, 1], 
and hyper-distributions Ag-^ , Ag^ , Aj-^ , Aj^ we have that Ag^ C Aj. implies 

As, p® As^ c Ai, p® Aj, . 

Proof: From (28) it is enough for each v to find a refinement matrix R such 
that 

R X {{As, p® As J X I„ w {{Ai, p® AiJ x 1^ . 

We have: 

(( Ai, p© Ai^ )) X 1„ 
« ip^HAiJ) ^ il-pMAiJ)xU "from (21)" 

= p*((Z\/j)) X It, =1= {l—p)*{{Aj2)) X ly "distribute post-multiplication" 

RJ "Zlg. CZ\/. implies ((Zij. ))xl„ ~ i?iX((Z\g. ))xl„ for some refinement 

p*RiX {{As J) xU ^ (l-p) * R2 X {{As J X U matrix i?." 

= "commute scalar multiplication; distribute post-nmltiplication" 

(i?i X p*{{As,)) + i?2 X {1-pMAsJ ) X U 

= "Lem. 3 for some refinement matrix R" 

Rx{p*{{AsJ) + {l-p)*{{AsJ)xly 
« Rx{{ As, p® As^ )) X 1„ . "from (21)" 

a 

Lemma 5. Pointwise monotonicity For all program texts P and hyper-distribut- 
ions As and Z\7 such that As E ^7, we have 

{Q{v,5y.As • IP1.(^^,<5)) C {q{v,S):Aj • IP].(«,<5)) . 

Proof: Let {i: / • M^} be a set of NxN matrices giving the normal form 
for |P] as at (23) above, so that for any {v, S) we have 

((IP].(«,<5))) « {^t:I-{iv,S))xM,) . 

From (28) and (24), it is enough to show that for each v'eV there exists a refine- 
ment matrix P such that Px (=|=z: / • ((Z\s))xMi) xl„/ w {^i:I'MiX{{Ai)))xly> 
We have 



^^ A sketch of the block matrices helps to see the pattern. 
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(+z:/. ((Z\j))xM,) X V 

{^i:I • {^v.V • ((Z\/)) X ly) X Mj) X !„- "((-^i)) is V-unique" 

(=!=«:/; w:V • ((^/)) x 1^, x Mi) x 1^/ "distribute post-multiplication" 

"zls C Zi/ implies ((/\/)) x 1„ ^ _R„ x ((/is)) x It, for some 
{:^i:I;v:V • Ry X ((Z\s)) X 1^, x Mi) X l^,/ refinement matrix _R„" 

i? X (=^i: r,v:V • {{As)) x 1„ x Mj) x 1„, "Lem. 3 

for some refinement matrix i?" 

i? X (=!=«:/ • ((/is)) X A'/j) X lyi . "distribute post-multiplication; 

{{As)) is V-unique" 

a 



Monotonicity of secure programs w.r.t. (C) Using Lem. 4 and Lem. 5 we 
now prove Thm. 2 from §6.5. Wc must show that if S'C/ then C{S)^C{I) for all 
contexts C built from programs as defined in Fig. 1. 

We use structural induction. For the base case, context C{S):— S is trivially 
monotonic. 

General probabilistic choice (and hence probabilistic and conditional choice) 
is trivially monotonic in either argument from monotonicity of addition over 
hyper-distributions (Lem. 4) . For example, for monotonicity in the first argument 
we have 

lS,...u(BRUv,S) 
= "Let qs'-~ {Qh:5 • q.v.h); General choice from Fig. 1" 

{{SUv, ih: S I q.v.h}) ,,® Ii?].(«, ih: S \ l-q.v.h})) 

E {lIl{v,ih:6\q.v.hl)g,(SlRUv,ih:S\l-q.v.hl)) "S C /; Lem. 4" 

= l-'^ q.v.h® ^l-(i', ^) ■ "General choice from Fig. 1" 

To show monotonicity of sequential composition in its right-hand argument 
we have for any programs R and S* C / and initial state (w, S) that 

lR;SUv,S) 

= (0 («', 6'): lRj.{v, S) • lSj.{v', S')) "Composition from Fig. 1" 

C {Q{v',S'):lRl{v,S)-lIl{v',S')) "SE/;Lem.4" 

= |i?; /].(«, (5) . "Composition from Fig. 1" 

For monotonicity in the first argument we have 

lS;RUv,5) 

= (0 («', 6'): lSj.{v, 6) • lRj.{v', S')) "Composition from Fig. 1" 

E {Q{v',6'):lll{v,6) • lRl.{v',5')) "S C 7 and Lem. 5" 

= |/; i?].(t', (5) . "Composition from Fig. 1" 
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D Example of completeness construction 

Here we illustrate the completeness proof set out in §7.3 by applying it to the 
example of §6, where we claimed that Pi%P2- We use §7.3 to find a C such that 
indeed P4;C;^ Pa; C". 

Our v' is 1, since that is where wc find the difference between P2 and P4 in 
the residual uncertainties of h; with that, we extract the fractions 

^hen.'-f I "^^ ■- af®i,3^AiSl®A,3®iS) 

and find that there are two values of h, two fractions in 11 p^ and three fractions 
in Up^. Accordingly we set A'^ to 3 and include an extra column for h=2 and 
an extra, zero fraction in 11 p^ . Note that ^ 11 p^ — ^ 11 p^ and that the total 
weight (of each) is 2/3. 

The NxN matrix corresponding to TJp^ is then as at right /1/6 
with its columns corresponding to values 1, 2, 3 of h and the I 1/6 1/6 

rows to iTpj's three fractions. The point obtained by con- \ q q 1/6 

catenating the rows is (1/6, 0, 0, 1/6, 0, 1/6, 0, 0, 1/6), and is 
in 9-dimensional space; but to avoid a proliferation of fractions, we scale ev- 
erything up from now on by a factor of 12, and so take xp^ to be the point 
(2,0,0,2,0,2,0,0,2). 

Now the scaled- up (and extended) matrix corresponding to 11 p^, with a se- 
lection of refinement- forming matrices M in A^^v, is given by 

'301\ /I 1 1\ f^^^\ /^^^\ /Oll^ 

103 and 000,010,000,100 

000/ \ooo/ \ooo/ \oioJ yooo^ 

Carrying out the matrix multiplications gives us these four possible refinements 
of Tip,: 

'404\ /30l\ /30l\ /103' 
000,103,000,301 

000/ \ooo/ V103/ yooo^ 

Doing all of them for M in Mn, and concatenating their rows to make points 
in 9-dimensional space, gives us this collection of refinements altogether: 

{ (4, 0, 4, 0, 0, 0, 0, 0, 0) , (3, 0, 1, 1, 0, 3, 0, 0, 0) , (3, 0, 1, 0, 0, 0, 1, 0, 3) , (30) 

(1,0,3,3,0,1,0,0,0) , (0,0,0,4,0,4,0,0,0) , (0,0,0,3,0,1,1,0,3) , 
(1, 0, 3, 0, 0, 0, 3, 0, 1) , (0, 0, 0, 1, 0, 3, 3, 0, 1) , (0, 0, 0, 0, 0, 0, 4, 0, 4) } . 

Our claim that Pi^P^ suggests that the point xp^ (corresponding to the matrix 
derived from Up^ ) should not lie in the convex closure of the points (30) above. 
We can see this easily by concentrating on the first and third dimensions 
only: for TT^ we get (2,0); and for Up^, that is (30), after removing duplicates 
we get (4,4), (3,1), (1,3) and (0,0). The i7p2-point xp^ is not in the convex 
closure of the other four because all of them have their two coordinates both 
positive or both zero, a property preserved by any convex combination but not 
shared by (2, 0). 
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-l)/3 



Now that we can concentrate on 
just two dimensions, it's easy to find a 
separating hyperplane with a picture. 
Fig. 4 shows the iJp^ -point as an open 
circle at (2,0), while the filled cir- 
cles give the vertices of the diamond- 
shaped convex closure corresponding 
to refinements of 11 p^ . Clearly the (de- 
generate) hyperplane y — (x— 1)/3 
separates the point from the diamond. 
The normal of that hyperplane (up 
and left, perpendicular to the line) 
has direction (—1,3), and when we 
fill in the other seven dimensions as 
zero -since they're not needed for 
separation- that gives us a candidate 
normal of X:= (-1, 0, 3, 0, 0, 0, 0, 0, 0) 
in 9-space. By translating X to ma- 
trix form and transposing it, we can 
then give a tentatitivc definition of D as shown at right. However this is not 
quite our final value for it. 

The dot-product of X with IIp^, that is tr. (IIp^xD), 
turns out to be —2; and with the refinements of Up^ shown 
at (30) we get the dot-products 8 and (multiple times), 
showing indeed the separation we expect, but in the wrong direction: the values 
and 8 for P4 are both strictly greater than the value —2 for P2, and we want 
them to be strictly less. Accordingly we multiply the tentative D above by — 1 ^'' 
and then add 3 to all its elements to make them non-negative; finally we divide 
everything by 10 to make its rows sum to no more than 1. To get the final D from 
this we must then add a new "zcro-th column" to make each row one-summing 
exactly. That gives 



D 



We insert a hyperplane (just a line, in 
2-space) midway between the separated 
point and the convex shape, parallel to the 
boundary of the latter. 

Fig. 4. Finding a separating hyperplane 
y = (a;— 1)/3 in 2-space. 
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The distinguishing context (— ; C), say, must then overwrite h according to the 
distribution given by a row of D, the one selected by the value /i of h incoming 
to C; thus wc construct C to be 



ifv= 



Tthen h:e (^l* 
else h:=0 fi 



4 0.4 o@0.3 



S0.3 



} if h=l else 



SO.4 o@0.3 Qi§!0.3 



I) 



The fact we can simply multiply by —1 to reverse the sense of the comparison does 
not mean we can just as easily construct a context to show P2%P4, — which would 
indeed be a worry. In fact for P2%Pi we'd need a shape Xpj and a point xp^; and 
then we would find xp^ inside the shape Xpj, thus unable to be separated from it. 
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with the outer if effectively restricting our attack to occur only when v'~l. 
(That allows us to ignore the h=2 case in D, as well.) Thus we have our context 
(— ; C). Let us now check that it actually works. 

We begin with P4; C. Its output hyper-distribution is (after some calculation) 
given by 

(l,SO®o-\l®o-3,2®o-3,3®o-3})®i , 
(l,SO®o-3,l®o-i,2®o.3^3@o.3|)@i J ^ 

whose Bayes Vulnerability is 1/3*1 + 1/3*0.3 + 1/3*0.3 « 0.53. On the other 
hand, for P2'-C the output hyper-distribution is 

(l,Sl®0-4,2®0-3,3®0-3;j)®5 

fl -!f0®°-^ ]^@o.2 2@o.3 3(ao.3^l.^(ai 

and here the vulnerability is 1/3*1 + 1/6*0.4 + 1/3*0.3 + 1/6*0.3 « 0.55. Note 
that in the third sunimand we took 0.3 rather than the larger 0.4 associated 
with 0, since as part of our construction we exclude guesses that h is 0. "^^ 

Thus we have established that P^iC -^ Pi\C (for the adjusted C — see 
Footnote 35), because the vulnerability of the former is 0.53 but for the latter 
the vulnerability is the greater 0.55. Hence when our refinement relation insists 
that Pa^Pi -as we argued earlier above- in fact it is not being too severe, but 
rather it is acting just as a compositional closure should. It protects us not only 
against the context C we just made, but all other contexts too — in spite of the 
fact that in isolation P4 and Pi are not distinguished by elementary testing. 

Finally, in this example there are many hyperplanes with distinct normals 
that achieve the separation we need, and each of these may be used to construct 
different distinguishing contexts. For example, since there exists a separating 
hyperplane with normal (0,0,2,1,0,0,1,0,0) we can use it to define another 



35 



Dealing with this detail would split the 0-case in half, uniformly distributed over 
— 1, —2, since the resulting probability 0.4/2 for each would then be small enough 
that a Bayes- Vulnerability attack would never choose it. The adjusted context C' 
would contain 

h:G({{l®°•^2®°•^3®°•^}}ifh = lelseS-2®0•^-l®«•^2®"•^3®°■^J) 

and the resulting output for P2 ; C would be 

{{(0,«Oj)®s , 

{■\ JT — O®''--'- _-|®0-l ■1®0.2 2®0.3 oQO.STi'i'aij 

n -fi" — 2®'''^ _i®0.2 2®0.3 oa0.3Ti\©i in 

in which neither —2 nor —1 would ever be chosen for a Bayes- Vulnerability attack. 
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distribution matrix 



D :-- 



0.5 0.25 0.25 

0.5 0.5 



from which we can specify the distinguishing context (— ; C), where C is 

if v=l then 

h:e ({{l®5,2®i,3®5;^ if h=l else §2®3,3®5;&) 
else h:= 1 fi , 



(31) 



normal M 



which requires no h:=0 case for D since the rows of its defining normal just 
happen to have the same sum. (That's not true for the middle row; but as 
before we can ignore it since, in the v'=l case we are considering, that row is 
never used.) '^^ 

Finding the normal that 
generates (31), however, is 
harder if done geometrically: it 
turns out that we would have 
had to specialise to three coor- 
dinate indices 3, 4 and 7 rather 
than just 1 and 3. The re- 
sulting inspection -to see just 
where to slip the hyperplane in 
between- would then have had 
to be done in three- rather than 
two dimensions, as Fig. 5 il- 
lustrates (in a side view). In 
general such hyperplanes can 
of course be found, without 
drawing pictures, by using con- 
straint solvers to deal with the 

linear inequalities symbolically. ^'S' ^- ^"^^^^8 ^ separating hyperplane, with 

(2,1,1) as normal, in 3 of the 9 dimensions. 




hyperplane 



E Proof of the Atomicity Lemma 

To prove the atomicity distribution lemma (Lem. 1 from §8.1) we use the matrix 
algebra from §B. 



^® It can be shown that this is a legitimate counter-example by using (4) and (31) to 
calculate the partitions 



v'=l 



np.r,c- 

IIp^-C'- 



\1 



2«A,3®i5- 



fl®A,2®s,3®5lJ,-g:2®A,3®A 



and observing that the vulnerability of Hp^-c is 1/8-1-7/48 — 13/48, which is just 
smaller than the vulnerability of -/IpaiC at 1/12-1-1/8-1-1/12 = 7/24. 
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Suppose wc have matrix representations (|-P{i,2}D for the classical program 
texts Pi and P2 and a row-matrix representation {(v, S)) of an incoming split- 
state. 

If from every initial and final v-state of Pi ; P2 it is possible to determine the 
intermediate value of v (after Pi and before P2) then there must exist a total 
function /: V^-V^-V such that for all v, v' we have 

U X (PiP X ^P2\) X 1,, = 1, X ^PiP X If,,,,, X (P2^ X U, (32) 

from which wc have for all it 7^ f-v.v' that 

U X (Pi) X U X (P2) X 1,, = On , (33) 

where Ojv is the NxN matrix of zeros. 

Assuming such an / with properties (32) and (33), we can calculate 

il{{Pi;P2))Uv,S))) 

« (tw':V • ((w,5)) X (|Pi;P2) X 1^0 "embedding" 

= {^v':V • {{v,d)) X (Pi) X (P2) X !„/) "classical composition" 

= {^v':V • {iv,S)) X 1„ X (Pi) X (P2) X 1,0 "((«,5))xl„ = ((«,5))" 

= (+t;':V.((«,(5))xl„x(Pi)xl/.,.„, x(P2)x V) "(32)" 

= {^v',v:V I v—f.v.v' • {iv,5)) x 1^, x (Pi) x I4, x (P2) x 1^,/) "one-point rule 

for (+)" 

~ "Oixiv is unit of concatenation, up to similarity" 

{^v',v:V\ v=,f.v.v' . {iv,5)) x 1, x (Pi) x U x (P2) x 1,,) 
^ {^v',v:V\v^f.v.v' -Oi^n) 

= "OixAT = ((«,<5))xOjv, (33)" 

{^v',v:V I v=f.v.v' ' {{v,S)) X 1, x (Pi) x U x (P2) x 1,,) 
+ {^v',v:V I v^f.v.v' . {iv,S)) X I„ x (Pi) x U x (P2) x 1„0 

w "v=f.v.v' and vj^f.v.v' are disjoint and exhaustive; 

(^) is commutative and associative up to similarity" 

i^v',v:V • {iv,6)) xUx (Pi) xUx (P2) x V) 

= {^i/,v:V . ((i;,(5)) x (Pi) x U x (P2) x V) '%v,5))xU = ((«,5))" 

= (^w': V . i^v . ((w,r5)) X (Pi) X U) X (P2) X 1,0 "distribute +" 

~ (([((^i»;((^2))]-(w,<5))) , "composition, embedding" 

whence our result follows because ((•)) is injective up to similarity and (v, S) was 
arbitrary. 
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F Informal description of the Oblivious Transfer 
implementation ^^ 

Given are two agents B, C; Agent B has two messsages m{o ij , bit-strings of the 
same length; Agent C has a message variable m and a choice c: {0, 1} of which 
of rrir-Q j^i is to be assigned to m. The specification is thus 

viss mo, mi; vise tIjC; 
m:= mc . 

Note that B does not discover c and that C does not discover m^^- 
The implementation is, informally, as follows: 

This is the prelude of the protocol 

1. Agent B chooses privately two random bit-strings rn'r^ ^-, to be used for V- 
encrypting m{o,i} respectively. 

2. Agent C chooses privately in c': {0, 1} which of the encrypting strings rn'r^ ^^ 
will be revealed to her. 

3. A trusted third party collects both m'rg ^^ from B, collects c' from C and 
then reveals (only) m^, to C. She throws m^^, away, and then leaves. 

From here is the main part of the protocol 

4. Agent C then tells B to encrypt and send messages in the following way: 

(a) If Agent C wants mo and has mg, then she instructs B to send her 
both moVmg and miVm'^. Because she has mf, she can recover mo via 
(moVmo) V m^,; but she cannot recover mi. 

(b) Similarly, if Agent C wants mo but has m'l instead (of mg), then she 
simply instructs B to send her both moVm'i and miVmQ, i.e. with the 
encryption the other way around. 

(c) If Agent C wants mi and has mj, — as for (4b). 

(d) If Agent C wants mi and has m'^ — as for (4a). 

The four cases (4a-4d) can be described succinctly -if cryptically- simply 
by instructing B to send miVm^^^^^, for z = 0, 1. 



An even more informal description is this fairy tale. An Apprentice magician is about 
to graduate, and he must now choose between black- or white magic. His Sorcerer 
will allow him to read either the Black Tome or the White Tome, not both; and his 
choice must be his own, uncoerced, thus never revealed to the Academy. 

The Sorcerer summons a trusted third party Djinn who gives him two locks, one 
black and one white; and the Djinn gives a single, golden key to the Apprentice. 
On the key is a small dot that only the Apprentice can see: it is the colour of the 
matching lock. The Djinn then returns to his own dimension. 

The Apprentice tells the Sorcerer to match the lock colours to the Tomes, or to 
reverse them: it depends on whether his choice matches the colour of the dot. The 
Sorcerer then leaves; the Apprentice can unlock only the Tome of his choice; and 
-provided he locks it again- no one afterwards will know which one he read. 
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Note that only Step (3) involves private messages (first between B and the third 
party, and then between the third party and C) , and that is only in the prelude, 
before any of the actual data m{o ij, c has appeared. Steps (1) and (2) involve no 
messages at all; and the messages occurring in Step (4) are V-encrypted already. 
In effect the prelude has created a one-time pad. 

A formal derivation of this implementation is given elsewhere [27] . 

G Alternative uncertainty measures 

G.l Shannon Entropy 

The Shannon Entropy of a (full) distribution S: DX is \-\.S:= (Q d: S • — lg{5.d)), 
that is the weighted average of the negated base-2 logarithms of its constituent 
probabilities [33] . By extension, for any hyper-distribution A we define the con- 
ditional Shannon Entropy H.Z\ to be {& {v,d): A • H.S), the expected value of 
the entropies of its support [3] . 

Going further, if we split up our hyper-distribution by v into its partitions, 
we have an equivalent presentation of entropy as the sum of individual partition- 
entropies H.A — {^v.V • H.(fracs.Z\.w)) , provided we define the entropy of a 
single partition, and of a single fraction, as follows: 

H.7T:= (E^:i7^H.7r) 
H.tt:^ {(3d:Tr-\g{[Tr].d)) , ^""^^ 

where we write Ig for — Ig to avoid a proliferation of minus signs, and [tt] is 
normalisation of the fraction tt, scaling it up (if necessary) to give a distribution 
again. 

The ordering (^h) based on hyper-distributions 

As^hAj := (h.As = h.Ai)A{H.As<H.Ai) 

is then specified, as for the Bayes order (^), so that Ag^\-\Aj if they are func- 
tionally equivalent and the uncertainty (the Shannon Entropy in this case) of Aj 
is no less than that of As- It extends pointwise to secure programs. Furthermore 
we write that S ^h I when 5* ^h I but / ;^h S. 

Non-compositionality Consider again two functionally-equivalent programs 
from our three-box puzzle example from §2 and §4: 

S:= h:=0el®2; v:e|w;®5,6®i-n; v:=± 

/2 := h:= 00102; v:e {w;®^^^^)^ ^@i-(h-2) j. ^.^^ 

with final hyper-distributions 

{{ (±,Si®i,2®f}), (±,|0®i,i®i:&) } (A',) 

and l{±,i2iy^'^ , {±,iO,llfl}. {A',J 
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The Shannon entropy of A'g, calculated 2* 5(5 Ig 5 + ilg|)i is slightly more 
than 0.918, exceeding the entropy A'j that, by the simpler calculation given by 
|(lgl) + |(2*ilgi), turns out to be exactly |; and so /2^h<5'- 

However if we define context C to be (— ; h:= (1 if h=2 else h)) then the 



2 
3 
a half of what it was, at «0.459. Hence Cih) :^h C{S) 



entropy of C(/2) is the same as before at 3; but the entropy of C(S') is now only 



Soundness We follow initially the structure of the soundness proof for Bayes 
Risk. Fix an initial split-state and construct the output hyper-distributions 
A'lg ji that result from S, I respectively. Then since we assume S'C/ we must 
have A'g\ZA'j. We now show that this implies A'g^H^'j- 

Since S'C/ trivially guarantees that h.A'g—h.A'j, we need to show that the 
Shannon Entropy condition in (^h) is satisfied. Since we have that H.A' is 
(Y^ v': V • H.(fracs.Z\.w')) , it is enough to show that for each v': V the entropy of 
n'g-.— hacs.A'g.v' is no less than the entropy of U'j-.— hacs.A'j.v', provided that 
n'g « 77' IZ n'j for some partition 77' depending on v' . 

For n'gK,n' we consider the unique 77 that is the reduction of both: it is 
formed in each case by adding together groups of similar fractions. From (34) 
and arithmetic, we obtain immediately that H.77g = H.77 = H.77'. '^^ 

For n'nn'j we know that the fractions of 77| are sums of groups of not- 
necessarily-similar fractions in 77'. We consider the special case of just two frac- 
tions 7r{i^2} in n' summing to a single fraction 7r:=7ri+7r2 in 77j, and look at 
their relative contributions to the sum (34); we have 

H.TT 

= H.(7ri+7r2) _ 

= (0d:(7ri+7r2) •lg([7ri+7r2].d)) _ 

= {Qd-.-Ki . lg([7ri+7r2].(i))-H(0d:7r2 • lg([7ri+7r2].d)) 

t > (0 d: VTi • lg([7ri].d)) + (0 d: 1^2 • lg([7r2].d)) "see below" 

= H.TTi + H.7r2 , 

that is that the contribution to the conditional entropy of tt on its own is at 
least as great as it was when was separated into 7r{i.2}. 

For "see below" we refer to the Key Lemma [36, p5] which states that for 
two total distributions 5, 5' of equal support, the weighted sum (0 d: (5 • \g{6' .d)) 
attains its minimum over 5' when 5—5'. 

Extending the argument similarly to multiple additions gives H.77' < H.77j 
as required and thus we have H.77^ < H.77| overall. We note that the inequality 
at t is strict when 7ri9^7r2, because then e.g. [7ri]7^[7ri+7r2]. ^^ We have estabhshed 

Theorem 6. Soundness of {\^) w.r.t. (^h) For all secure programs S and 7 
and contexts C, we have that 5*^7 implies C{S)<^^C{I). □ 



^® If 7ri~7r2 then [7ri] = [7r2] = [7ri+7r2] and so the line marked f below becomes an equality. 
^^ This follows from a strengthening of the Key Lemma to ". . . only when 5=5' " which 
is implied by the proof of Thm. 1 [op. cit.] immediately before. 
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Finally, when 5C/ but 5 7^ / so that A'gy^A'j for some initial split-state we 
must have Ug^^IIj for some final u', since both those partitions are in reduced 
form: that is, reduced partitions cannot be similar without actually being equal. 
Thus also n'^n'j, and so we can find particular ^1^112 to realise the strict 
inequality at f . That gives us 

Lemma 6. Strict soundness For all hyper-distributions ^{1,2} '^6 have that 

Z\i!ZZ\2 but Z\i 7^ A2 implies Zii-<H/\2- ' ^ 

G.2 Marginal guesswork 

The Marginal guesswork [30] of a distribution 6: DX is the least number of 
guesses an attacker requires to be sure that her chance of guessing some h chosen 
according to S is at least a given probability a. We define it 

W„.(5 := {ni:l..N \U'5>a) 

where we write U'^S, or more generally U'tt for fraction n to mean the sum of 
the i greatest probabilities in tt, and N is the cardinality of X. Note that by 
super-distribution of maximum over addition we have U'(7ri-t-7r2) < UVi + UV2 
for any i in the proper range. To avoid clutter, we will omit the range l..A^ for 
i from here on. 

For a hyper-distribution A we define 

W„.Zi := (re I (0 {v, S): A ■ U'S) > a) 
or equivalently W^-A := (Hi \ (^ v: V; tt: fracs.Zi.w • UV) > a) ^ ' 

which is the least value i such that if an attacker is allowed to make that many 
guesses then she can discover the value of h with probability at least a. 

Observe that our definition of \Na.A is not the same as the conditional 
marginal guesswork {(1)6: A • Wq.(5) as conventionally defined [15]. We argue 
that conditional marginal guesswork is not a reasonable measure of the number 
of guesses required by an attacker to ensure that the probability of guessing h 
in A is greater than a. Consider for example the hyper-distributions 

^5:= «(«, SO}), (v, il. Am and Ar.= {{v, fO} ® Sl-4})} ■'' 

Note that As^Aj since the latter is obtained by merging the two split-states of 
the former. 

Now an attacker has more information about how h was chosen in As than in 
Aj: for Ag she knows not only that h is distributed according to the distribution 
® -§[1..4} overall (as for Z\/), but as well she knows when h was chosen from 
and when h was chosen from -§[1..4}. However, when we set a:=l/2 the 
conditional marginal guesswork of As is |(1) + ^(4a), that is 3/2 — which is 
higher than for Z\/, which gives only 1. This suggests that it is harder for an 



*" Alternatively we could write out {[Oj} © 51..4J with its explicit probabilities as 
|-Q®5 ^ 1® 8 , 2® 8 , 3® 8 , 4®5 J., but we prefer to avoid the superscripts. 
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attacker to guess h in As than in Aj, in spite of the fact that the attacker knows 
more about the final h-distribution in As when launching an attack. 

Using our W^ we have \Ni/2-As — y\/i/2-Aj, that is 1 in both cases: with 
just one guess at her disposal an attacker is guaranteed to guess h at least half 
the time. Applying her one guess to As, half the time she can guess and is 
sure to be right; in Ai she guesses and will be right half the time. 

The ordering between hyper-distributions based on marginal guesswork is 

As^w^Aj := {ft.As=h.Ai)A{\N,,.As<\N^.Aj) 
which extends pointwise to programs. 

Non-compositionality When a is not zero, marginal guesswork -like the other 
measures- is non-compositional for our subset of programs. For such an a^O 
take, for example, functionally equivalent programs 

S:= U:eiO,l,2l^(Si-N..~ll; 

v:eiiw®Kb®^'-H ifh>Oelse Iw^'^^b®^); 
v:=_L 

/:= h:eiO,l,2l^(Si-N..-ll; 

v:e (|w;®^-2 6®i-^-2^ ifh>Oelse iw®-3,b®^); 
v:=_L 

such that if a=l then iV=:0 else N > 3x^—^. These programs have the final 
output distributions 

and U±,i2}^®i^N...-l}rK {±AOA} c® i-N...-llfl} . {A',) 

We can calculate that both \Na-A'g and \Na-A'j are 2, and so 5* ^w„ I, but that 
for context C defined as (— ; h:= (h -^ 2 if h>0 else h)) we have \Na-A'j is only 
1, while \NaA'g remains at 2 — and so C{S) -^w^ C(/). 

Soundness 

Lemma 7. (C) implies (^w„) For all hyper-distributions As and Aj and 
probabilities a, if ZigCZi/ then also Z\5^vv„^/j consequently 5C/ implies5^w„ -^■ 
Proof: From (35) and the definition of refinement (Def. 2, §6.4) it is enough 
to show that for any partition 77 and i in range that (i) if the fractions in 
n are similar then (X) tt: 77 • UV) = U' {J2 tt: 7T) else (ii) (X] tt: 77 • UV) > 
U* (X it: n) . To show (ii) we have by generalising U'(7ri-|-7r2) < UVi -|-U*7r2 that 
indeed 

(X;7r:77.UV) > U*(X;7r:77), 

and for (i) we can replace inequality by equality since (U*) distributes over 
summation in that case. □ 
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Theorem 7. Soundness o/ (C) w.r.t. (^w„) For all probabilities a, secure 
programs S, I and contexts C we have that S* C / implies C{S) ^w„ C(I). 

Proof: Lem. 7 and monotonicity of (C) (Thni. 2 from §6.5). □ 

G.3 Guessing entropy 

The guessing entropy [19] of a distribution S is the (least) average number of 
guesses required to guess h in S. It is equivalent to the average a-marginal guess- 
work over all values of a [30] , and we define it 

\N.5 :^ (^i:1..7V- n'S) , 

where Fl'^S is the sum of the i smallest probabilities in 6. ^^ Note that by subdis- 
tribution of minimum over addition we have n'(7ri+7r2) > n*7ri + n*7r2 for any 
i in range. For hyper-distribution A we define the conditional guessing entropy, 

thus 

\N.A := {Q{v,d):A-\N.S) 

or equivalently \N.A :— (^ w: V; tt: fracs.Z\.w • W.tt) , 
where W.tt is defined in the same way as \N .6. We define the ordering by 

As ^w Ai := {h.As = ft.Ai) A (W.As < W.Ai) , 

which extends pointwise to secure programs. 

Non-compositionality To show non-compositionality of ordering (^w) we 
refer again (as we did for Shannon entropy in §G.l) to the functionally equivalent 
programs S and 12- First we calculate that 

WA's = 2xi(i + (i + |)) = I 



and VJA' = i(l) + |(i + (i + i)) 



_ 4 
3 V-^/ ' 3 V2 ' V2 ' 2^^ 3 



SO that we have I2 ^w S. Again taking context C to be (— ; h:= (1 if h=2 else h)) 
we get that the guessing entropy of C(S') is reduced to | while that of C(/2) is 
stiU |, and hence Cih) ^\n C{S). 

Soundness 

Lemma 8. (C) implies (^w) For all hyper-distributions As and Aj, we have 
that As^Aj implies As^vjAj; and consequently 5'C/ implies S^\/\/I. 

Proof: As in the proof of soundness for marginal guesswork, it is enough 
to show that for any partition U (i) if the fractions in U are similar then 
(X; 71-: 77 • W.tt) =W.(^7r:77) else (ii) (X) tt: 77 • W.tt) < W. (^tt: 77) . For 
(ii) we reason: 

''^ If wlog the four probabilities a,b,c,d are ordered greatest to least, then the best 
strategy is to guess (the value associated with) a first, and then to go on to guess 
b, c, d in order as necessary. The average number of guesses needed overall is then 
a -I- 26 -I- 3c -I- 4d, that is d + (d+c) + {d+c+h) + [d+c+h+a). 
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< 



(^7r:77 • W.tt) 

iEi-nHE^-n-n)) 

W.(E^:^) • 



"definition W for a partition" 

"swap summations" 

"subdistribute minimisation" 

"definition W for partition" 



When all the fractions tt in iT are similar, we can replace the inequality in the 
second-last step with equality, establishing (i). □ 

Theorem 8. Soundness of (C) w.r.t. (^w) For all programs S and / and 
contexts C we have that S'C/ implies C{S)^\nC{I). 

Proof: Immediate from Lem. 8 and monotonicity of (C) (Thm. 2 in §6.5). 

D 



